On 19 Mar 2014, at 10:15 am, Andrew Beekhof <and...@beekhof.net> wrote:
> > On 18 Mar 2014, at 10:04 pm, Gabriel Gomiz <ggo...@cooperativaobrera.coop> > wrote: > >> Maybe, this is significant : 'Our DC node >> (gandalf.san01.cooperativaobrera.coop) left the cluster' ... ? > > Very. I hadn't noticed it was the DC at the time it died. > >> >> Please tell me if you need more details: > > Can I get the file logs from lorien from Mar 08 08:43:00 to 09:14:00 please? > Riiiight, so this is the story: Mar 08 08:43:22 [9934] lorien crmd: info: do_dc_takeover: Taking over DC status for this partition Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Peer gandalf was terminated (st_notify_fence) by mordor for gandalf: OK (ref=10d27664-33ed-43e0-a5bd-7d0ef850eb05) by client crmd.31561 Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Notified CMAN that 'gandalf' is now fenced Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Target may have been our leader gandalf (recorded: <unset>) Mar 08 09:13:52 [9934] lorien crmd: info: do_dc_takeover: Taking over DC status for this partition Mar 08 09:13:52 [9934] lorien crmd: notice: do_dc_takeover: Marking gandalf, target of a previous stonith action, as clean In tengine_stonith_notify() we potentially add things to stonith_cleanup_list and then in do_dc_takeover() we check the stonith_cleanup_list and mark any nodes in it as clean. As you can see above, the stonith notification comes just after the call to do_dc_takeover(). In the version you have there is some dodgy code in tengine_stonith_notify() which incorrectly adds gandalf to stonith_cleanup_list, causing Pacemaker to (incorrectly) erase its status section at 9:13:52 when another election occurs. This was fixed during the RC-phase of Pacemaker-1.1.10: https://github.com/beekhof/pacemaker/commit/f30e1e43 I don't believe I quite understood the severity of that fix at the time (otherwise I'd have made more noise about it). Since you're on CentOS 6.4, there should already be updated packages that include this fix.
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org