Hi,

I've tried moving the corosync startup from S20 to S98 but the issue is still 
there.

Maybe I'll have to remove it from init and write an upstart for corosync.



________________________________
From: Andreas Kurz <andr...@hastexo.com>
To: pacemaker@oss.clusterlabs.org
Sent: Tuesday, 25 October 2011 6:50 PM
Subject: Re: [Pacemaker] Cluster goes to (unmanaged) Failed state when both 
nodes are rebooted together

hello,

On 10/25/2011 09:17 AM, ihjaz Mohamed wrote:
> If I start the corosync together on both the servers, it comes up good.
> So am just wondering how is this different from corosync being started
> by the server during boot up.

maybe corosync ist started to early on system boot when network
connectivity is not fully established.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> 
> ------------------------------------------------------------------------
> *From:* Andreas Kurz <andr...@hastexo.com>
> *To:* pacemaker@oss.clusterlabs.org
> *Sent:* Monday, 24 October 2011 9:30 PM
> *Subject:* Re: [Pacemaker] Cluster goes to (unmanaged) Failed state when
> both nodes are rebooted together
> 
> hello,
> 
> On 10/24/2011 05:21 PM, ihjaz Mohamed wrote:
>> Its part of the requirement given to me to support this solution on
>> servers without stonith devices. So I cannot enable the stonith.
> 
> Too bad, than you have to live with some limitations of this setup. You
> could add some random wait to/before corosync start ... or simply: don't
> reboot them at the same time ;-)
> 
> But it would also be interesting why FloatingIP_stop_0 returns an error
> on both nodes ... logs should tell you what happened.
> 
> .... and remove nic="eth0:0", you must not define any alias here but
> only the nic itself.
> 
> Regards,
> Andreas
> 
> -- 
> Need help with Pacemaker?
> http://www.hastexo.com/now
> 
> 
>>
>> ------------------------------------------------------------------------
>> *From:* Alan Robertson <al...@unix.sh <mailto:al...@unix.sh>>
>> *To:* ihjaz Mohamed <ihjazmoha...@yahoo.co.in
> <mailto:ihjazmoha...@yahoo.co.in>>; The Pacemaker clusterFloatingIP_stop_0
>> resource manager <pacemaker@oss.clusterlabs.org
> <mailto:pacemaker@oss.clusterlabs.org>>
>> *Sent:* Monday, 24 October 2011 8:22 PM
>> *Subject:* Re: [Pacemaker] Cluster goes to (unmanaged) Failed state when
>> both nodes are rebooted together
>>
>> Setting no-quorum-policy to ignore and disabling stonith is not a good
>> idea.  You're sort of inviting the cluster to do screwed up things.
>>
>>
>> On 10/24/2011 08:23 AM, ihjaz Mohamed wrote:
>>> Hi All,
>>>
>>> I 've pacemaker running with corosync. Following is my CRM configuration.
>>>
>>> node soalaba56
>>> node soalaba63
>>> primitive FloatingIP ocf:heartbeat:IPaddr2 \
>>>        params ip="<floating_ip>" nic="eth0:0"
>>> primitive acestatus lsb:acestatus \
>>> primitive pingd ocf:pacemaker:ping \
>>>        params host_list="<gateway_ip>" multiplier="100" \
>>>        op monitor interval="15s" timeout="5s"
>>> group HAService FloatingIP acestatus \
>>>        meta target-role="Started"
>>> clone pingdclone pingd \
>>>        meta globally-unique="false"
>>> location ip1_location FloatingIP \
>>>        rule $id="ip1_location-rule" pingd: defined pingd
>>> property $id="cib-bootstrap-options" \
>>>      
>>> dc-version="1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>>>        cluster-infrastructure="openais" \
>>>        expected-quorum-votes="2" \
>>>        stonith-enabled="false" \
>>>        no-quorum-policy="ignore" \
>>>        last-lrm-refresh="1305736421"
>>> ----------------------------------------------------------------------
>>>
>>> When I reboot both the nodes together, cluster goes into an
>>> (unmanaged) Failed state as shown below.
>>>
>>>
>>> ============
>>> Last updated: Mon Oct 24 08:10:42 2011
>>> Stack: openais
>>> Current DC: soalaba63 - partition with quorum
>>> Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
>>> 2 Nodes configured, 2 expected votes
>>> 2 Resources configured.
>>> ============
>>>
>>> Online: [ soalaba56 soalaba63 ]
>>>
>>>  Resource Group: HAService
>>>      FloatingIP (ocf::heartbeat:IPaddr2) Started  (unmanaged)
>>> FAILED[  soalaba63      soalaba56 ]
>>>      acestatus  (lsb:acestatus):        Stopped
>>>  Clone Set: pingdclone [pingd]
>>>      Started: [ soalaba56 soalaba63 ]
>>>
>>> Failed actions:
>>>    FloatingIP_stop_0 (node=soalaba63, call=7, rc=1, status=complete):
>>> unknown error
>>>    FloatingIP_stop_0 (node=soalaba56, call=7, rc=1, status=complete):
>>> unknown error
>>>
> ------------------------------------------------------------------------------
>>>
>>> This happens only when the reboot is done simultaneously on both the
>>> nodes. If reboot is done with some interval in between this is not
>>> seen. Looking into the logs I see that  when the nodes come up
>>> resources are started on both the nodes and then it tries to stop the
>>> started resources and fails there.
>>>
>>> I've attached the logs.
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> <mailto:Pacemaker@oss.clusterlabs.org>
> <mailto:Pacemaker@oss.clusterlabs.org
> <mailto:Pacemaker@oss.clusterlabs.org>>
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>> --
>>    Alan Robertson <al...@unix.sh <mailto:al...@unix.sh>>
> <mailto:al...@unix.sh <mailto:al...@unix.sh>>
>>
>> "Openness is the foundation and preservative of friendship...  Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> <mailto:Pacemaker@oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> <mailto:Pacemaker@oss.clusterlabs.org>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to