Andrew Beekhof <and...@beekhof.net> wrote: > 18014 is where we're up to now, 16048 is the (old) one that scheduled the recurring monitor operation. > I suspect you'll find the action failed earlier in the logs and thats why it needed to be restarted. > > Not the best log message though :(
Thanks Andrew for the quick answer. I still need more info if possible. I searched everywhere for transaction 16048 and I couldn't find a trace of it (looked for up to 5 days of logs prior to transaction 18014). It would have been good if we had timestamps for each transaction involved in this situation :-) Is there a way to find about this old transaction in any other logs (I looked into /var/log/messages on both nodes involved in this cluster)? To give you an idea of how many transactions happened during this period: TR_ID 18010 @ 21:52:16 ... TR_ID 18018 @ 22:55:25 Over an hour between these two events. Given this, how come such a (very) old transaction (~2000 transactions before current one) only acts now? Could it be stale information in pacemaker? Thanks in advance. Youssef Message: 4 from Pacemaker Digest, Vol 61, Issue 34 --------------------------------------------------------------- Date: Thu, 13 Dec 2012 10:52:42 +1100 From: Andrew Beekhof <and...@beekhof.net> To: The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> Subject: Re: [Pacemaker] Action from a different CRMD transition results in restarting services Message-ID: <CAEDLWG2LtrPuxTRrd=jbv1sxtilbg3sb0nu0feyf3yrgrnc...@mail.gmail.com> Content-Type: text/plain; charset=windows-1252 On Thu, Dec 13, 2012 at 6:31 AM, Latrous, Youssef <ylatr...@broadviewnet.com> wrote: > Hi, > > > > I run into the following issue and I couldn?t find what it really means: > > > > Detected action msgbroker_monitor_10000 from a different transition: > 16048 vs. 18014 18014 is where we're up to now, 16048 is the (old) one that scheduled the recurring monitor operation. I suspect you'll find the action failed earlier in the logs and thats why it needed to be restarted. Not the best log message though :( > > > > I can see that its impact is to stop/start a service but I?d like to > understand it a bit more. > > > > Thank you in advance for any information. > > > > > > Logs about this issue: > > ? > > Dec 6 22:55:05 Node1 crmd: [5235]: info: process_graph_event: > Detected action msgbroker_monitor_10000 from a different transition: > 16048 vs. 18014 > > Dec 6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph: > process_graph_event:477 - Triggered transition abort (complete=1, > tag=lrm_rsc_op, id=msgbroker_monitor_10000, > magic=0:7;104:16048:0:5fb57f01-3397-45a8-905f-c48cecdc8692, cib=0.971.5) : > Old event > > Dec 6 22:55:05 Node1 crmd: [5235]: WARN: update_failcount: Updating > failcount for msgbroker on Node0 after failed monitor: rc=7 > (update=value++, > time=1354852505) > > Dec 6 22:55:05 Node1 crmd: [5235]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > > Dec 6 22:55:05 Node1 crmd: [5235]: info: do_state_transition: All 2 > cluster nodes are eligible to run resources. > > Dec 6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28069: > Requesting the current CIB: S_POLICY_ENGINE > > Dec 6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph: > te_update_diff:142 - Triggered transition abort (complete=1, > tag=nvpair, id=status-Node0-fail-count-msgbroker, magic=NA, > cib=0.971.6) : Transient > attribute: update > > Dec 6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28070: > Requesting the current CIB: S_POLICY_ENGINE > > Dec 6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph: > te_update_diff:142 - Triggered transition abort (complete=1, > tag=nvpair, id=status-Node0-last-failure-msgbroker, magic=NA, > cib=0.971.7) : Transient > attribute: update > > Dec 6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28071: > Requesting the current CIB: S_POLICY_ENGINE > > Dec 6 22:55:05 Node1 attrd: [5232]: info: find_hash_entry: Creating > hash entry for last-failure-msgbroker > > Dec 6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke_callback: > Invoking the PE: query=28071, ref=pe_calc-dc-1354852505-39407, seq=12, > quorate=1 > > Dec 6 22:55:05 Node1 pengine: [5233]: notice: unpack_config: On loss > of CCM > Quorum: Ignore > > Dec 6 22:55:05 Node1 pengine: [5233]: notice: unpack_rsc_op: > Operation > txpublisher_monitor_0 found resource txpublisher active on Node1 > > Dec 6 22:55:05 Node1 pengine: [5233]: WARN: unpack_rsc_op: Processing > failed op msgbroker_monitor_10000 on Node0: not running (7) > > ? > > Dec 6 22:55:05 Node1 pengine: [5233]: notice: common_apply_stickiness: > msgbroker can fail 999999 more times on Node0 before being forced off > > ? > > Dec 6 22:55:05 Node1 pengine: [5233]: notice: RecurringOp: Start > recurring monitor (10s) for msgbroker on Node0 > > ? > > Dec 6 22:55:05 Node1 pengine: [5233]: notice: LogActions: Recover > msgbroker (Started Node0) > > ? > > Dec 6 22:55:05 Node1 crmd: [5235]: info: te_rsc_command: Initiating > action > 37: stop msgbroker_stop_0 on Node0 > > > > > > Transition 18014 details: > > > > Dec 6 22:52:18 Node1 pengine: [5233]: notice: process_pe_message: > Transition 18014: PEngine Input stored in: > /var/lib/pengine/pe-input-3270.bz2 > > Dec 6 22:52:18 Node1 crmd: [5235]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > > Dec 6 22:52:18 Node1 crmd: [5235]: info: unpack_graph: Unpacked > transition > 18014: 0 actions in 0 synapses > > Dec 6 22:52:18 Node1 crmd: [5235]: info: do_te_invoke: Processing > graph > 18014 (ref=pe_calc-dc-1354852338-39406) derived from > /var/lib/pengine/pe-input-3270.bz2 > > Dec 6 22:52:18 Node1 crmd: [5235]: info: run_graph: > ==================================================== > > Dec 6 22:52:18 Node1 crmd: [5235]: notice: run_graph: Transition > 18014 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pengine/pe-input-3270.bz2): Complete > > Dec 6 22:52:18 Node1 crmd: [5235]: info: te_graph_trigger: Transition > 18014 is now complete > > Dec 6 22:52:18 Node1 crmd: [5235]: info: notify_crmd: Transition > 18014 > status: done - <null> > > Dec 6 22:52:18 Node1 crmd: [5235]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > > Dec 6 22:52:18 Node1 crmd: [5235]: info: do_state_transition: > Starting PEngine Recheck Timer > > > > > > Youssef > > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ------------------------------ Message: 5 Date: Thu, 13 Dec 2012 01:17:17 +0000 From: Xavier Lashmar <xlash...@uottawa.ca> To: The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> Subject: Re: [Pacemaker] gfs2 / dlm on centos 6.2 Message-ID: <cc445c0ceb8b8a4c87297d880d8f903bbcc0f...@cms-p04.uottawa.o.univ> Content-Type: text/plain; charset="windows-1252" I see, thanks very much for pointing me in the right direction! Xavier Lashmar Universit? d'Ottawa / University of Ottawa Analyste de Syst?mes | Systems Analyst Service ?tudiants, service de l'informatique et des communications | Student services, computing and communications services. 1 Nicholas Street (810) Ottawa ON K1N 7B7 T?l. | Tel. 613-562-5800 (2120) ________________________________ From: Andrew Beekhof [and...@beekhof.net] Sent: Tuesday, December 11, 2012 9:30 PM To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] gfs2 / dlm on centos 6.2 On Wed, Dec 12, 2012 at 1:29 AM, Xavier Lashmar <xlash...@uottawa.ca<mailto:xlash...@uottawa.ca>> wrote: Hello, We are attempting to mount gfs2 partitions on CentOS using DRBD + COROSYNC + PACEMAKER. Unfortunately we consistently get the following error: You'll need to configure pacemaker to use cman for this. See: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from _Scratch/ch08s02.html # mount /dev/vg_data/lv_data /webdata/ -t gfs2 -v mount /dev/dm-2 /webdata parse_opts: opts = "rw" clear flag 1 for "rw", flags = 0 parse_opts: flags = 0 parse_opts: extra = "" parse_opts: hostdata = "" parse_opts: lockproto = "" parse_opts: locktable = "" gfs_controld join connect error: Connection refused error mounting lockproto lock_dlm We are trying to find out where to get the lock_dlm libraries and packages for Centos 6.2 and 6.3 Also, I found that the document ?Pacemaker 1.1 - Clusters from Scratch? the Fedora 17 version is a bit problematic. I?m also running a Fedora 17 system and found no package ?dlm? as per the instructions in section 8.1.1 yum install -y gfs2-utils dlm kernel-modules-extra Any idea if an external repository is needed? If so, which one ? and which package do we need to install for CentOS 6+ ? Thanks very much [Description: Description: cid:D85E51EA-D618-4CBC-9F88-34F696123DED] Xavier Lashmar Analyste de Syst?mes | Systems Analyst Service ?tudiants, service de l'informatique et des communications/Student services, computing and communications services. 1 Nicholas Street (810) Ottawa ON K1N 7B7 T?l. | Tel. 613-562-5800 (2120) _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<mailto:Pacemaker@oss.clusterlabs.org> http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121213/d23 bdf24/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 916 bytes Desc: image003.png URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121213/d23 bdf24/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 989 bytes Desc: image001.png URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121213/d23 bdf24/attachment-0001.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4219 bytes Desc: image002.png URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121213/d23 bdf24/attachment-0002.png> ------------------------------ _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker End of Pacemaker Digest, Vol 61, Issue 34 ***************************************** _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org