Re: [Openais] firewire

ray klassen Tue, 08 Mar 2011 13:43:55 -0800

one other thing. in this configuration, corosync has to be shot in the head 
itself to stop. /etc/init.d/corosync stop results in something like
"Waiting for corosync services to stop" and lines and lines of dots. Kill -9 is 
the only way, it seems.





----- Original Message ----
From: ray klassen <[email protected]>
To: [email protected]
Sent: Tue, 8 March, 2011 13:12:27
Subject: Re: [Openais] firewire

MCP is not really mentioned anywhere except ClusterGuy's blog (maybe you're 
him) 

but from that I'm assuming that you mean starting the pacemaker separately. as 
/etc/init.d/pacemaker. So I removed the /etc/corosync/services.d/pcmk file. I 
also (from ClusterGuy's page on 'MCP' 
http://theclusterguy.clusterlabs.org/post/907043024/introducing-the-pacemaker-master-control-process-for

) added 'cman' (yum install cman -- for mailing list readers yet to come) from 
the alternative 2. 


And it does work. I now can view a 'partition with quorum' with crm_mon. over 
firewire, with udpu. 


Just don't really know how it works. how does pacemaker communicate with the 
stack? etc.? unix sockets? shared memory? how does corosync communicate with 
the 

stack? 







----- Original Message ----
From: Steven Dake <[email protected]>
To: ray klassen <[email protected]>
Cc: [email protected]
Sent: Tue, 8 March, 2011 10:02:28
Subject: Re: [Openais] firewire

First off, I'd recommend using the "MCP" process that is part of
Pacemaker rather then the plugin.

Second, if you could run corosync-objctl and put the output on the list,
along with your /etc/corosync/corosyn.conf, that would be helpful.

Regards
-steve

On 03/08/2011 09:19 AM, ray klassen wrote:
> what I'm finding on further investigation is that all the pacemaker
> child processes are dying on startup
> 
> 
> Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child
> process lrmd exited (pid=6356, rc=100)
> Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child
> process lrmd no longer wishes to be respawned
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving
> born-on unset: 308
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update:
> id=168430090, born=0, seq=308
> Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now
> has process list: 00000000000000000000000000111302 (1118978)
> Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child
> process cib exited (pid=6355, rc=100)
> Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child
> process cib no longer wishes to be respawned
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving
> born-on unset: 308
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update:
> id=168430090, born=0, seq=308
> Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now
> has process list: 00000000000000000000000000111202 (1118722)
> Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child
> process crmd exited (pid=6359, rc=100)
> Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child
> process crmd no longer wishes to be respawned
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving
> born-on unset: 308
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update:
> id=168430090, born=0, seq=308
> Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now
> has process list: 00000000000000000000000000111002 (1118210)
> Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue
> Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child
> process attrd exited (pid=6357, rc=100)
> Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child
> process attrd no longer wishes to be respawned
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving
> born-on unset: 308
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update:
> id=168430090, born=0, seq=308
> Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now
> has process list: 00000000000000000000000000110002 (1114114)
> Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child
> process pengine exited (pid=6358, rc=100)
> Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child
> process pengine no longer wishes to be respawned
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving
> born-on unset: 308
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update:
> id=168430090, born=0, seq=308
> Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now
> has process list: 00000000000000000000000000100002 (1048578)
> Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue
> Mar 08 08:15:28 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child
> process stonith-ng exited (pid=6354, rc=100)
> Mar 08 08:15:28 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child
> process stonith-ng no longer wishes to be respawned
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Leaving
> born-on unset: 308
> Mar 08 08:15:28 corosync [pcmk  ] debug: send_cluster_id: Local update:
> id=168430090, born=0, seq=308
> Mar 08 08:15:28 corosync [pcmk  ] info: update_member: Node wwww.com now
> has process list: 00000000000000000000000000000002 (2)
> Mar 08 08:15:28 corosync [TOTEM ] mcasted message added to pending queue
> Mar
> 
> 
> 
> ------------------------------------------------------------------------
> *From:* Dan Frincu <[email protected]>
> *To:* [email protected]
> *Sent:* Tue, 8 March, 2011 2:45:00
> *Subject:* Re: [Openais] firewire
> 
> 
> 
> On Tue, Mar 8, 2011 at 2:07 AM, ray klassen
> <[email protected] <mailto:[email protected]>>
> wrote:
> 
>     well I have the 1.3.0 version of corosync seemingly happy with udpu and
>     firewire. The logs report connection back and forth between the two
>     boxes. But
>     now crm_mon never connects. Does pacemaker not support udpu yet?
> 
> 
> Pacemaker is the Cluster Resource Manager, so it doesn't really care
> about the underlying method that the Messaging and Membership layer uses
> to connect between nodes.
> 
> I've had this issue (crm_mon not connecting) when I performed an upgrade
> from openais-0.80 to corosync-1.3.0 with udpu, I solved it by eventually
> rebooting the servers. In your case I doubt it's an upgrade between
> versions of software, since you've reinstalled.
> 
> My 2 cents.
>  
> 
> 
>     pacemaker-1.1.4-5.fc14.i686
>     (I switched to fedora from debian to get the latest version of corosync)
> 
> 
> 
> 
>     ----- Original Message ----
>     From: Steven Dake <[email protected] <mailto:[email protected]>>
>     To: ray klassen <[email protected]
>     <mailto:[email protected]>>
>     Cc: [email protected]
>     <mailto:[email protected]>
>     Sent: Thu, 3 March, 2011 16:56:21
>     Subject: Re: [Openais] firewire
> 
>     On 03/03/2011 05:45 PM, ray klassen wrote:
>     > Has anyone had any success running corosync with the firewire-net
>     module? I
>     >want
>     >
>     > to set up a two node router cluster with a dedicated link between
>     the routers.
> 
>     > Only problem is, I've run out of ethernet ports so I've got ip
>     configured on
>     >the
>     >
>     > firewire ports. pinging's no problem between the addresses.. funny
>     thing is, on
>     >
>     > one of them (and they're really identical) corosync starts up no
>     problem at all
>     >
>     > and stays up. on the other one corosync fails with  "ERROR:
>     ais_dispatch:
>     > Receiving message body failed: (2) Library error: Resource temporarily
>     > unavailable (11)."
>     >
>     >
>     > Reading up on the firewire-net mailing outstanding issues turned
>     up that
>     > multicast wasn't fully implemented so my corosync.conf files both say
>     >broadcast:
>     >
>     > yes. instead of mcast-addr
>     >
>     > Firewire-net was emitting fwnet_write_complete: failed: 10  errors
>     so I pulled
> 
>     > down the latest vanilla kernel 2.6.37.2 and am running that. with
>     far fewer of
> 
>     > that error..
>     >
>     > otherwise versions are
>     > Debian Squeeze
>     > Corosync Version: 1.2.1-4
>     > Pacemaker 1.0.9.1+hg15626-1
>     >
>     > Is this a hopeless case? I've a got a debug log from corosync that
>     doesn't seem
>     >
>     > that helpful. If you want I can post that as well
>     >
>     > Thanks
>     >
> 
>     I'm hesitant to suggest using firewire as a transport as your the first
>     person that has ever tried it.  If multicast is broken on your hardware,
>     you might try the "udpu" transport which uses UDP only (udp is the basis
>     for all network communication).
> 
>     Regards
>     -steve
> 
>     >
>     >
>     > _______________________________________________
>     > Openais mailing list
>     > [email protected]
>     <mailto:[email protected]>
>     > https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> 
> 
>     _______________________________________________
>     Openais mailing list
>    [email protected]
>     <mailto:[email protected]>
>    https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> 
> 
> 
> -- 
> Dan Frincu
> CCNA, RHCE
> 
> 
> 
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais


      
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais



      
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] firewire

Reply via email to