what I'm finding on further investigation is that all the pacemaker child 
processes are dying on startup


Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process lrmd 
exited (pid=6356, rc=100)
Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child process 
lrmd 
no longer wishes to be respawned
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving born-on 
unset: 
308
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update: 
id=168430090, born=0, seq=308
Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now has 
process list: 00000000000000000000000000111302 (1118978)
Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib 
exited (pid=6355, rc=100)
Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child process cib 
no longer wishes to be respawned
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving born-on 
unset: 
308
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update: 
id=168430090, born=0, seq=308
Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now has 
process list: 00000000000000000000000000111202 (1118722)
Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process crmd 
exited (pid=6359, rc=100)
Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child process 
crmd 
no longer wishes to be respawned
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving born-on 
unset: 
308
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update: 
id=168430090, born=0, seq=308
Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now has 
process list: 00000000000000000000000000111002 (1118210)
Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue
Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process 
attrd 
exited (pid=6357, rc=100)
Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child process 
attrd no longer wishes to be respawned
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving born-on 
unset: 
308
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update: 
id=168430090, born=0, seq=308
Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now has 
process list: 00000000000000000000000000110002 (1114114)
Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process 
pengine exited (pid=6358, rc=100)
Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child process 
pengine no longer wishes to be respawned
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving born-on 
unset: 
308
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update: 
id=168430090, born=0, seq=308
Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now has 
process list: 00000000000000000000000000100002 (1048578)
Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue
Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process 
stonith-ng exited (pid=6354, rc=100)
Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child process 
stonith-ng no longer wishes to be respawned
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving born-on 
unset: 
308
Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update: 
id=168430090, born=0, seq=308
Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now has 
process list: 00000000000000000000000000000002 (2)
Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue
Mar






________________________________
From: Dan Frincu <[email protected]>
To: [email protected]
Sent: Tue, 8 March, 2011 2:45:00
Subject: Re: [Openais] firewire




On Tue, Mar 8, 2011 at 2:07 AM, ray klassen <[email protected]> 
wrote:

well I have the 1.3.0 version of corosync seemingly happy with udpu and
>firewire. The logs report connection back and forth between the two boxes. But
>now crm_mon never connects. Does pacemaker not support udpu yet?
>

Pacemaker is the Cluster Resource Manager, so it doesn't really care about the 
underlying method that the Messaging and Membership layer uses to connect 
between nodes.

I've had this issue (crm_mon not connecting) when I performed an upgrade from 
openais-0.80 to corosync-1.3.0 with udpu, I solved it by eventually rebooting 
the servers. In your case I doubt it's an upgrade between versions of software, 
since you've reinstalled.

My 2 cents.
 

>pacemaker-1.1.4-5.fc14.i686
>(I switched to fedora from debian to get the latest version of corosync)
>
>
>
>
>
>----- Original Message ----
>From: Steven Dake <[email protected]>
>To: ray klassen <[email protected]>
>Cc: [email protected]
>Sent: Thu, 3 March, 2011 16:56:21
>Subject: Re: [Openais] firewire
>
>
>On 03/03/2011 05:45 PM, ray klassen wrote:
>> Has anyone had any success running corosync with the firewire-net module? I
>>want
>>
>> to set up a two node router cluster with a dedicated link between the 
routers.
>
>> Only problem is, I've run out of ethernet ports so I've got ip configured on
>>the
>>
>> firewire ports. pinging's no problem between the addresses.. funny thing is, 
>on
>>
>> one of them (and they're really identical) corosync starts up no problem at 
>all
>>
>> and stays up. on the other one corosync fails with  "ERROR: ais_dispatch:
>> Receiving message body failed: (2) Library error: Resource temporarily
>> unavailable (11)."
>>
>>
>> Reading up on the firewire-net mailing outstanding issues turned up that
>> multicast wasn't fully implemented so my corosync.conf files both say
>>broadcast:
>>
>> yes. instead of mcast-addr
>>
>> Firewire-net was emitting fwnet_write_complete: failed: 10  errors so I 
pulled
>
>> down the latest vanilla kernel 2.6.37.2 and am running that. with far fewer 
of
>
>> that error..
>>
>> otherwise versions are
>> Debian Squeeze
>> Corosync Version: 1.2.1-4
>> Pacemaker 1.0.9.1+hg15626-1
>>
>> Is this a hopeless case? I've a got a debug log from corosync that doesn't 
>seem
>>
>> that helpful. If you want I can post that as well
>>
>> Thanks
>>
>
>I'm hesitant to suggest using firewire as a transport as your the first
>person that has ever tried it.  If multicast is broken on your hardware,
>you might try the "udpu" transport which uses UDP only (udp is the basis
>for all network communication).
>
>Regards
>-steve
>
>>
>>
>> _______________________________________________
>> Openais mailing list
>> [email protected]
>> https://lists.linux-foundation.org/mailman/listinfo/openais
>
>
>
>_______________________________________________
>Openais mailing list
>[email protected]
>https://lists.linux-foundation.org/mailman/listinfo/openais
>


-- 
Dan Frincu
CCNA, RHCE



      
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to