On 10 October 2010 17:40, Andrew Beekhof <and...@beekhof.net> wrote: > On Sun, Oct 10, 2010 at 12:47 AM, Pavlos Parissis > <pavlos.paris...@gmail.com> wrote: >> Hi, >> >> My resource is not started because I get this >> >> 00:44:27 crmd: [3141]: WARN: status_from_rc: Action 16 >> (pbx_02_monitor_0) on node-02 failed (target: 7 vs. rc: 5): Error >> >> but when I run manually the status I get 3, which ok because the >> application is stopped >> >> [r...@node-02 ~]# /etc/init.d/znd-pbx_02 status >> pbx_02 is stopped >> [r...@node-02 ~]# echo $? >> 3 >> >> why does crm get error in this case? > > I imagine because when pacemaker ran it, the script didn't return 3. > pacemaker got 5 because the script returns 5 when the application is not available on the system, which happens only when the fs is not active. What actually happened in this particular case is the the start action on fs and on the resource, which holds the application, started on the same second. I am pretty sure that the start of the application resource went too fast and at the time the LSB script was executed the fs was not available, even the fs resources returned 0 on start and on the first monitor. This issue doesn't happen always but if I put a sleep on LSB script for the application resource I don't run into that issue. The resource are in group with order ip fs app. I also removed the exit code 5 from the LSB script, it confuses the cluster when the monitor action does place on the slave node.
Cheers, Pavlos _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker