Not enough information. We'd need more than just the lrmd's logs, they only show what happened not why.
On Thu, Oct 7, 2010 at 11:02 PM, Shravan Mishra <shravan.mis...@gmail.com> wrote: > Hi, > > Description of my environment: > corosync=1.2.8 > pacemaker=1.1.3 > Linux= 2.6.29.6-0.6.smp.gcc4.1.x86_64 #1 SMP > > > We are having a problem with our pacemaker which is continuously > canceling the monitoring operation of our stonith devices. > > We ran: > > stonith -d -t external/safe/ipmi hostname=ha2.itactics.com > ipaddr=192.168.2.7 userid=hellouser passwd=hello interface=lanplus -S > > it's output is attached as stonith.output. > > We have been trying to debug this issue for a few days now with no success. > We are hoping that someone can help us as we are under immense > pressure to move to RCS unless we can solve this issue in a day or two > ,which I personally don't want to because we like the product. > > Any help will be greatly appreciated. > > > Here is an excerpt from the /var/log/messages: > ========================= > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11155: start > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11156: monitor > Oct 7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation > monitor[11156] on > stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584, > its parameters: CRM_meta_interval=[20000] target_role=[started] > ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000] > crm_feature_set=[3.0.2] CRM_meta_name=[monitor] > hostname=[ha2.itactics.com] passwd=[ft01st0...@] > userid=[safe_ipmi_admin] cancelled > Oct 7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11157: > stop > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11158: start > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11159: monitor > Oct 7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation > monitor[11159] on > stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584, > its parameters: CRM_meta_interval=[20000] target_role=[started] > ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000] > crm_feature_set=[3.0.2] CRM_meta_name=[monitor] > hostname=[ha2.itactics.com] passwd=[ft01st0...@] > userid=[safe_ipmi_admin] cancelled > Oct 7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11160: > stop > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11161: start > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11162: monitor > Oct 7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation > monitor[11162] on > stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584, > its parameters: CRM_meta_interval=[20000] target_role=[started] > ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000] > crm_feature_set=[3.0.2] CRM_meta_name=[monitor] > hostname=[ha2.itactics.com] passwd=[ft01st0...@] > userid=[safe_ipmi_admin] cancelled > Oct 7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11163: > stop > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11164: start > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11165: monitor > Oct 7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation > monitor[11165] on > stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584, > its parameters: CRM_meta_interval=[20000] target_role=[started] > ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000] > crm_feature_set=[3.0.2] CRM_meta_name=[monitor] > hostname=[ha2.itactics.com] passwd=[ft01st0...@] > userid=[safe_ipmi_admin] cancelled > Oct 7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11166: > stop > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11167: start > Oct 7 16:58:29 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11168: monitor > Oct 7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation > monitor[11168] on > stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584, > its parameters: CRM_meta_interval=[20000] target_role=[started] > ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000] > crm_feature_set=[3.0.2] CRM_meta_name=[monitor] > hostname=[ha2.itactics.com] passwd=[ft01st0...@] > userid=[safe_ipmi_admin] cancelled > Oct 7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11169: > stop > Oct 7 16:58:30 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11170: start > Oct 7 16:58:30 ha1 lrmd: [3581]: info: stonithRA plugin: got > metadata: <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM > "ra-api-1.dtd"> <resource-agent name="external/safe/ipmi"> > <version>1.0</version> <longdesc lang="en"> ipmitool based power > management. Apparently, the power off method of ipmitool is > intercepted by ACPI which then makes a regular shutdown. If case of a > split brain on a two-node it may happen that no node survives. For > two-node clusters use only the reset method. </longdesc> > <shortdesc lang="en">IPMI STONITH external device </shortdesc> > <parameters> <parameter name="hostname" unique="1"> <content > type="string" /> <shortdesc lang="en"> Hostname </shortdesc> <longdesc > lang="en"> The name of the host to be managed by this STONITH device. > </longdesc> </parameter> <parameter name="ipaddr" unique="1"> > <content type="string" /> <shortdesc lang="en"> IP Address > </shortdesc> <longdesc lang="en"> The IP address of the STONITH > device. </longdesc> </parameter> <parameter name="userid" unique="1"> > <content type="string" /> <shortdesc lang="en"> Login </shortdesc> > <longdesc lang="en"> The username used for logging in to the STONITH > device. </longdesc> </parameter> <parameter name="passwd" unique="1"> > <content type="string" /> <shortdesc lang="en"> Password </shortdesc> > <longdesc lang="en"> The password used for logging in to the STONITH > device. </longdesc> </parameter> <parameter name="interface" > unique="1"> <content type="string" default="lan"/> <shortdesc > lang="en"> IPMI interface </shortdesc> <longdesc lang="en"> IPMI > interface to use, such as "lan" or "lanplus". </longdesc> </parameter> > </parameters> <actions> <action name="start" timeout="15" /> > <action name="stop" timeout="15" /> <action name="status" > timeout="15" /> <action name="monitor" timeout="15" interval="15" > start-delay="15" /> <action name="meta-data" timeout="15" /> > </actions> <special tag="heartbeat"> <version>2.0</version> > </special> </resource-agent> > Oct 7 16:58:30 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11171: monitor > Oct 7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation > monitor[11171] on > stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584, > its parameters: CRM_meta_interval=[20000] target_role=[started] > ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000] > crm_feature_set=[3.0.2] CRM_meta_name=[monitor] > hostname=[ha2.itactics.com] passwd=[ft01st0...@] > userid=[safe_ipmi_admin] cancelled > Oct 7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11172: > stop > Oct 7 16:58:30 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11173: start > Oct 7 16:58:30 ha1 lrmd: [3581]: info: > rsc:ha2.itactics.com-stonith:11174: monitor > > ========================== > > Thanks > > Shravan > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker