Its part of the requirement given to me to support this solution on servers without stonith devices. So I cannot enable the stonith.
________________________________ From: Alan Robertson <al...@unix.sh> To: ihjaz Mohamed <ihjazmoha...@yahoo.co.in>; The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> Sent: Monday, 24 October 2011 8:22 PM Subject: Re: [Pacemaker] Cluster goes to (unmanaged) Failed state when both nodes are rebooted together Setting no-quorum-policy to ignore and disabling stonith is not a good idea. You're sort of inviting the cluster to do screwed up things. On 10/24/2011 08:23 AM, ihjaz Mohamed wrote: Hi All, > > >I 've pacemaker running with corosync. Following is my CRM configuration. > > >node soalaba56 >node soalaba63 >primitive FloatingIP ocf:heartbeat:IPaddr2 \ > params ip="<floating_ip>" nic="eth0:0" >primitive acestatus lsb:acestatus \ >primitive pingd ocf:pacemaker:ping \ > params host_list="<gateway_ip>" multiplier="100" \ > op monitor interval="15s" timeout="5s" >group HAService FloatingIP acestatus \ > meta target-role="Started" >clone pingdclone pingd \ > meta globally-unique="false" >location ip1_location FloatingIP \ > rule $id="ip1_location-rule" pingd: defined pingd >property $id="cib-bootstrap-options" \ > dc-version="1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1305736421" >---------------------------------------------------------------------- > > >When I reboot both the nodes together, cluster goes into an (unmanaged) Failed >state as shown below. > > > > >============ >Last updated: Mon Oct 24 08:10:42 2011 >Stack: openais >Current DC: soalaba63 - partition with quorum >Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f >2 Nodes configured, 2 expected votes >2 Resources configured. >============ > >Online: [ soalaba56 soalaba63 ] > > Resource Group: HAService > FloatingIP (ocf::heartbeat:IPaddr2) Started (unmanaged) FAILED[ soalaba63 soalaba56 ] > acestatus (lsb:acestatus): Stopped > Clone Set: pingdclone [pingd] > Started: [ soalaba56 soalaba63 ] > >Failed actions: > FloatingIP_stop_0 (node=soalaba63, call=7, rc=1, status=complete): unknown error > FloatingIP_stop_0 (node=soalaba56, call=7, rc=1, status=complete): unknown error > >------------------------------------------------------------------------------ > > > >This happens only when the reboot is done simultaneously on both the nodes. If >reboot is done with some interval in between this is not seen. Looking into >the logs I see that when the nodes come up resources are started on both the >nodes and then it tries to stop the started resources and fails there. > > >I've attached the logs. > > > > > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- Alan Robertson <al...@unix.sh> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker