what I'm finding on further investigation is that all the pacemaker child processes are dying on startup
Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process lrmd exited (pid=6356, rc=100) Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child process lrmd no longer wishes to be respawned Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving born-on unset: 308 Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: id=168430090, born=0, seq=308 Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now has process list: 00000000000000000000000000111302 (1118978) Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib exited (pid=6355, rc=100) Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child process cib no longer wishes to be respawned Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving born-on unset: 308 Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: id=168430090, born=0, seq=308 Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now has process list: 00000000000000000000000000111202 (1118722) Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process crmd exited (pid=6359, rc=100) Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child process crmd no longer wishes to be respawned Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving born-on unset: 308 Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: id=168430090, born=0, seq=308 Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now has process list: 00000000000000000000000000111002 (1118210) Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd exited (pid=6357, rc=100) Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child process attrd no longer wishes to be respawned Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving born-on unset: 308 Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: id=168430090, born=0, seq=308 Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now has process list: 00000000000000000000000000110002 (1114114) Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process pengine exited (pid=6358, rc=100) Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child process pengine no longer wishes to be respawned Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving born-on unset: 308 Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: id=168430090, born=0, seq=308 Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now has process list: 00000000000000000000000000100002 (1048578) Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process stonith-ng exited (pid=6354, rc=100) Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child process stonith-ng no longer wishes to be respawned Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving born-on unset: 308 Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: id=168430090, born=0, seq=308 Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now has process list: 00000000000000000000000000000002 (2) Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue Mar ________________________________ From: Dan Frincu <[email protected]> To: [email protected] Sent: Tue, 8 March, 2011 2:45:00 Subject: Re: [Openais] firewire On Tue, Mar 8, 2011 at 2:07 AM, ray klassen <[email protected]> wrote: well I have the 1.3.0 version of corosync seemingly happy with udpu and >firewire. The logs report connection back and forth between the two boxes. But >now crm_mon never connects. Does pacemaker not support udpu yet? > Pacemaker is the Cluster Resource Manager, so it doesn't really care about the underlying method that the Messaging and Membership layer uses to connect between nodes. I've had this issue (crm_mon not connecting) when I performed an upgrade from openais-0.80 to corosync-1.3.0 with udpu, I solved it by eventually rebooting the servers. In your case I doubt it's an upgrade between versions of software, since you've reinstalled. My 2 cents. >pacemaker-1.1.4-5.fc14.i686 >(I switched to fedora from debian to get the latest version of corosync) > > > > > >----- Original Message ---- >From: Steven Dake <[email protected]> >To: ray klassen <[email protected]> >Cc: [email protected] >Sent: Thu, 3 March, 2011 16:56:21 >Subject: Re: [Openais] firewire > > >On 03/03/2011 05:45 PM, ray klassen wrote: >> Has anyone had any success running corosync with the firewire-net module? I >>want >> >> to set up a two node router cluster with a dedicated link between the routers. > >> Only problem is, I've run out of ethernet ports so I've got ip configured on >>the >> >> firewire ports. pinging's no problem between the addresses.. funny thing is, >on >> >> one of them (and they're really identical) corosync starts up no problem at >all >> >> and stays up. on the other one corosync fails with "ERROR: ais_dispatch: >> Receiving message body failed: (2) Library error: Resource temporarily >> unavailable (11)." >> >> >> Reading up on the firewire-net mailing outstanding issues turned up that >> multicast wasn't fully implemented so my corosync.conf files both say >>broadcast: >> >> yes. instead of mcast-addr >> >> Firewire-net was emitting fwnet_write_complete: failed: 10 errors so I pulled > >> down the latest vanilla kernel 2.6.37.2 and am running that. with far fewer of > >> that error.. >> >> otherwise versions are >> Debian Squeeze >> Corosync Version: 1.2.1-4 >> Pacemaker 1.0.9.1+hg15626-1 >> >> Is this a hopeless case? I've a got a debug log from corosync that doesn't >seem >> >> that helpful. If you want I can post that as well >> >> Thanks >> > >I'm hesitant to suggest using firewire as a transport as your the first >person that has ever tried it. If multicast is broken on your hardware, >you might try the "udpu" transport which uses UDP only (udp is the basis >for all network communication). > >Regards >-steve > >> >> >> _______________________________________________ >> Openais mailing list >> [email protected] >> https://lists.linux-foundation.org/mailman/listinfo/openais > > > >_______________________________________________ >Openais mailing list >[email protected] >https://lists.linux-foundation.org/mailman/listinfo/openais > -- Dan Frincu CCNA, RHCE
_______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
