one other thing. in this configuration, corosync has to be shot in the head itself to stop. /etc/init.d/corosync stop results in something like "Waiting for corosync services to stop" and lines and lines of dots. Kill -9 is the only way, it seems.
----- Original Message ---- From: ray klassen <[email protected]> To: [email protected] Sent: Tue, 8 March, 2011 13:12:27 Subject: Re: [Openais] firewire MCP is not really mentioned anywhere except ClusterGuy's blog (maybe you're him) but from that I'm assuming that you mean starting the pacemaker separately. as /etc/init.d/pacemaker. So I removed the /etc/corosync/services.d/pcmk file. I also (from ClusterGuy's page on 'MCP' http://theclusterguy.clusterlabs.org/post/907043024/introducing-the-pacemaker-master-control-process-for ) added 'cman' (yum install cman -- for mailing list readers yet to come) from the alternative 2. And it does work. I now can view a 'partition with quorum' with crm_mon. over firewire, with udpu. Just don't really know how it works. how does pacemaker communicate with the stack? etc.? unix sockets? shared memory? how does corosync communicate with the stack? ----- Original Message ---- From: Steven Dake <[email protected]> To: ray klassen <[email protected]> Cc: [email protected] Sent: Tue, 8 March, 2011 10:02:28 Subject: Re: [Openais] firewire First off, I'd recommend using the "MCP" process that is part of Pacemaker rather then the plugin. Second, if you could run corosync-objctl and put the output on the list, along with your /etc/corosync/corosyn.conf, that would be helpful. Regards -steve On 03/08/2011 09:19 AM, ray klassen wrote: > what I'm finding on further investigation is that all the pacemaker > child processes are dying on startup > > > Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child > process lrmd exited (pid=6356, rc=100) > Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child > process lrmd no longer wishes to be respawned > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving > born-on unset: 308 > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: > id=168430090, born=0, seq=308 > Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now > has process list: 00000000000000000000000000111302 (1118978) > Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child > process cib exited (pid=6355, rc=100) > Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child > process cib no longer wishes to be respawned > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving > born-on unset: 308 > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: > id=168430090, born=0, seq=308 > Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now > has process list: 00000000000000000000000000111202 (1118722) > Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child > process crmd exited (pid=6359, rc=100) > Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child > process crmd no longer wishes to be respawned > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving > born-on unset: 308 > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: > id=168430090, born=0, seq=308 > Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now > has process list: 00000000000000000000000000111002 (1118210) > Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue > Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child > process attrd exited (pid=6357, rc=100) > Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child > process attrd no longer wishes to be respawned > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving > born-on unset: 308 > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: > id=168430090, born=0, seq=308 > Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now > has process list: 00000000000000000000000000110002 (1114114) > Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child > process pengine exited (pid=6358, rc=100) > Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child > process pengine no longer wishes to be respawned > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving > born-on unset: 308 > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: > id=168430090, born=0, seq=308 > Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now > has process list: 00000000000000000000000000100002 (1048578) > Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue > Mar 08 08:15:28 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child > process stonith-ng exited (pid=6354, rc=100) > Mar 08 08:15:28 corosync [pcmk ] notice: pcmk_wait_dispatch: Child > process stonith-ng no longer wishes to be respawned > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Leaving > born-on unset: 308 > Mar 08 08:15:28 corosync [pcmk ] debug: send_cluster_id: Local update: > id=168430090, born=0, seq=308 > Mar 08 08:15:28 corosync [pcmk ] info: update_member: Node wwww.com now > has process list: 00000000000000000000000000000002 (2) > Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue > Mar > > > > ------------------------------------------------------------------------ > *From:* Dan Frincu <[email protected]> > *To:* [email protected] > *Sent:* Tue, 8 March, 2011 2:45:00 > *Subject:* Re: [Openais] firewire > > > > On Tue, Mar 8, 2011 at 2:07 AM, ray klassen > <[email protected] <mailto:[email protected]>> > wrote: > > well I have the 1.3.0 version of corosync seemingly happy with udpu and > firewire. The logs report connection back and forth between the two > boxes. But > now crm_mon never connects. Does pacemaker not support udpu yet? > > > Pacemaker is the Cluster Resource Manager, so it doesn't really care > about the underlying method that the Messaging and Membership layer uses > to connect between nodes. > > I've had this issue (crm_mon not connecting) when I performed an upgrade > from openais-0.80 to corosync-1.3.0 with udpu, I solved it by eventually > rebooting the servers. In your case I doubt it's an upgrade between > versions of software, since you've reinstalled. > > My 2 cents. > > > > pacemaker-1.1.4-5.fc14.i686 > (I switched to fedora from debian to get the latest version of corosync) > > > > > ----- Original Message ---- > From: Steven Dake <[email protected] <mailto:[email protected]>> > To: ray klassen <[email protected] > <mailto:[email protected]>> > Cc: [email protected] > <mailto:[email protected]> > Sent: Thu, 3 March, 2011 16:56:21 > Subject: Re: [Openais] firewire > > On 03/03/2011 05:45 PM, ray klassen wrote: > > Has anyone had any success running corosync with the firewire-net > module? I > >want > > > > to set up a two node router cluster with a dedicated link between > the routers. > > > Only problem is, I've run out of ethernet ports so I've got ip > configured on > >the > > > > firewire ports. pinging's no problem between the addresses.. funny > thing is, on > > > > one of them (and they're really identical) corosync starts up no > problem at all > > > > and stays up. on the other one corosync fails with "ERROR: > ais_dispatch: > > Receiving message body failed: (2) Library error: Resource temporarily > > unavailable (11)." > > > > > > Reading up on the firewire-net mailing outstanding issues turned > up that > > multicast wasn't fully implemented so my corosync.conf files both say > >broadcast: > > > > yes. instead of mcast-addr > > > > Firewire-net was emitting fwnet_write_complete: failed: 10 errors > so I pulled > > > down the latest vanilla kernel 2.6.37.2 and am running that. with > far fewer of > > > that error.. > > > > otherwise versions are > > Debian Squeeze > > Corosync Version: 1.2.1-4 > > Pacemaker 1.0.9.1+hg15626-1 > > > > Is this a hopeless case? I've a got a debug log from corosync that > doesn't seem > > > > that helpful. If you want I can post that as well > > > > Thanks > > > > I'm hesitant to suggest using firewire as a transport as your the first > person that has ever tried it. If multicast is broken on your hardware, > you might try the "udpu" transport which uses UDP only (udp is the basis > for all network communication). > > Regards > -steve > > > > > > > _______________________________________________ > > Openais mailing list > > [email protected] > <mailto:[email protected]> > > https://lists.linux-foundation.org/mailman/listinfo/openais > > > > _______________________________________________ > Openais mailing list > [email protected] > <mailto:[email protected]> > https://lists.linux-foundation.org/mailman/listinfo/openais > > > > > -- > Dan Frincu > CCNA, RHCE > > > > > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
