Re: [Pacemaker] Corosync & IPAddr problems(?)

Dejan Muhamedagic Mon, 07 Feb 2011 07:40:46 -0800

Hi,

On Mon, Feb 07, 2011 at 02:01:11PM +0100, Stephan-Frank Henry wrote:
> Hello again,
> 
> I am having some possible problems with Corosync and IPAddr.
> To be more specific, when I do a /etc/init.d/corosync stop, while everything 
> shuts down more or less gracefully, the virtual ip never is released (still 
> visible with ifconfig).
> 
> if I do a 'sudo ifdown --force eth0:0' it works. So there should be no direct 
> reason for this.
> 
> This might not by itself be a problem, but I fear it could also be related to 
> a 'split-brain' corosync handling due to network cable disconnect.
> Though that might be something else, I'd rather remove all other problems and 
> then see if it fixes itself.
> 
> I have checked syslog, but nothing really jumps out.
> Are there any other logs or places where I can look?
> 
> thanks everyone!
> 
> Frank
> 
> (pls scream if more or other info is needed)
> 
> -------------------------------------------------------------
> 
> OS: Debian Lenny 64bit, kernel version: 2.6.33.3
> Corosnyc: 1.2.1-1~bpo50+1
> cluster-glue: 1.0.6-1~bpo50+1
> libheartbeat2: 1:3.0.3-2~bpo50+1
> 
> relevant cib.xml entry:
> <primitive id="ip_resource" class="ocf" type="IPaddr" provider="heartbeat">
>   <instance_attributes id="virtual-ip-attribs">
>     <attributes>
>       <nvpair id="virtual-ip-addr" name="ip" value="150.158.183.30"/>
>       <nvpair id="virtual-ip-addr-nic" name="nic" value="eth0"/>
>       <nvpair id="virtual-ip-addr-netmask" name="cidr_netmask" value="22"/>
>     </attributes>
>   </instance_attributes>
>   <operations>
>     <op id="virtual-ip-monitor-10s" interval="10s" name="monitor"/>
>   </operations>
> </primitive>
> 
> here is a reduced log (only the ip stuff):
> Feb  7 13:39:40 serverA pengine: [8695]: notice: unpack_rsc_op: Operation 
> ip_resource_monitor_0 found resource ip_resource active on serverA
> Feb  7 13:39:40 serverA pengine: [8695]: notice: native_print:      
> ip_resource#011(ocf::heartbeat:IPaddr):#011Started serverA
> Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights: 
> ms_drbd0: Rolling back scores from ip_resource
> Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights: 
> ms_drbd0: Rolling back scores from ip_resource
> Feb  7 13:39:40 serverA pengine: [8695]: info: native_merge_weights: 
> ip_resource: Rolling back scores from fs0
> Feb  7 13:39:40 serverA pengine: [8695]: info: native_color: Resource 
> ip_resource cannot run anywhere
> Feb  7 13:39:40 serverA pengine: [8695]: notice: LogActions: Stop resource 
> ip_resource#011(serverA)
> Feb  7 13:39:40 serverA crmd: [8696]: info: do_state_transition: State 
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
> cause=C_IPC_MESSAGE origin=handle_response ]
> Feb  7 13:39:42 serverA crmd: [8696]: info: te_rsc_command: Initiating action 
> 33: stop ip_resource_stop_0 on serverA (local)
> Feb  7 13:39:42 serverA lrmd: [8693]: info: cancel_op: operation monitor[7] 
> on ocf::IPaddr::ip_resource for client 8696, its parameters: 
> CRM_meta_interval=[10000] ip=[150.158.183.30] 
> Feb  7 13:39:42 serverA crmd: [8696]: info: do_lrm_rsc_op: Performing 
> key=33:13:0:0dff3321-22f5-411c-a50a-e95fcfa4dd6f op=ip_resource_stop_0 )
> Feb  7 13:39:42 serverA lrmd: [8693]: info: rsc:ip_resource:14: stop
> Feb  7 13:39:42 serverA crmd: [8696]: info: process_lrm_event: LRM operation 
> ip_resource_monitor_10000 (call=7, status=1, cib-update=0, confirmed=true) 
> Cancelled
> Feb  7 13:40:02 serverA lrmd: [8693]: WARN: ip_resource:stop process (PID 
> 10541) timed out (try 1).  Killing with signal SIGTERM (15).


The stop action times out. You should check why. Note that
ifdown ... is not what IPaddr uses, but ifconfig down. You can
also test the resource using ocf-tester outside of cluster.

Thanks,

Dejan

> Feb  7 13:40:02 serverA lrmd: [8693]: WARN: operation stop[14] on 
> ocf::IPaddr::ip_resource for client 8696, its parameters: ip=[150.158.183.30] 
> cidr_netmask=[22] CRM_meta_timeout=[20000] 
> Feb  7 13:40:02 serverA lrmd: [8693]: info: record_op_completion: cannot 
> record operation stop[14] on ocf::IPaddr::ip_resource for client 8696: the 
> client is gone
> Feb  7 13:40:02 serverA lrmd: [8693]: WARN: notify_client: client for the 
> operation operation stop[14] on ocf::IPaddr::ip_resource for client 8696, its 
> parameters: ip=[150.158.183.30] 
> 
> -- 
> Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
> Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Corosync & IPAddr problems(?)

Reply via email to