On 06/20/2011 04:40 AM, Hans Lammerts wrote:
> Hi,
> 
>  
> 
> I just installed two servers, both with Ubuntu 10.04 64bit. No problems
> here.
> 
> The idea is to create a cluster that is going to server as a high
> available Zarafa environment.
> 
> One of the steps is to install corosync (which gets installed
> automatically when installing Pacemaker).
> 
> I have two NICs on both machines:
> 
> machine 1: one wired ethernetcard (eth0), ip 192.168.2.20
> 
>                      one wirelsess card (wlan0), ip 10.1.1.2
> 
> machine 2: one wired ethernetcard (eth0), ip 192.168.2.30
> 
>                      one wired ethernetcard (eth1), ip 10.1.1.3
> 
>  
> 
> The intention is to have corosync communicate on the 10.1.1.0 subnet.
> Therefore the subnet 10.1.1.0 is configured in corosync.conf on both
> machines.
> 
>  
> 
> On machine two, corosync seems to work. If I do a crm_mon i get  1 nodes
> configured, and Online [ cl2 ] (which is the hostname of the 2nd machine).
> 
> On the first machine, I don't seem to be able to start corosync
> normally. After starting corosync, I see the following in the daemon.log
> file:
> 
>  
> 
> Please, notice the errors at the end.
> 
> Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Corosync Cluster Engine
> ('1.2.0'): started and ready to provide service.
> Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Corosync built-in
> features: nss
> Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Successfully read main
> configuration file '/etc/corosync/corosync.conf'.
> Jun 20 11:08:29 cl1 corosync[2173]:   [TOTEM ] Initializing transport
> (UDP/IP).
> Jun 20 11:08:29 cl1 corosync[2173]:   [TOTEM ] Initializing
> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Compatibility mode set to
> whitetank.  Using V1 and V2 of the synchronization engine.
> Jun 20 11:08:29 cl1 corosync[2173]:   [TOTEM ] The network interface is
> down.
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: process_ais_conf:
> Reading configure
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_init:
> Local handle: 5650605097994944514 for logging
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_next:
> Processing additional logging options...
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Found 'off' for option: debug
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Found 'no' for option: to_logfile
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Found 'yes' for option: to_syslog
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Found 'daemon' for option: syslog_facility
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_init:
> Local handle: 2730409743423111171 for service
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_next:
> Processing additional service options...
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Defaulting to 'pcmk' for option: clustername
> Jun 20 11:08:29 cl1 stonithd: [2181]: notice:
> /usr/lib/heartbeat/stonithd is already running.
> Jun 20 11:08:29 cl1 cib: [2182]: info: Invoked: /usr/lib/heartbeat/cib
> Jun 20 11:08:29 cl1 attrd: [2184]: info: Invoked: /usr/lib/heartbeat/attrd
> Jun 20 11:08:29 cl1 pengine: [2185]: info: Invoked:
> /usr/lib/heartbeat/pengine
> Jun 20 11:08:29 cl1 crmd: [2186]: info: Invoked: /usr/lib/heartbeat/crmd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Defaulting to 'no' for option: use_logd
> Jun 20 11:08:29 cl1 stonithd: [2189]: notice:
> /usr/lib/heartbeat/stonithd is already running.
> Jun 20 11:08:29 cl1 cib: [2190]: info: Invoked: /usr/lib/heartbeat/cib
> Jun 20 11:08:29 cl1 lrmd: [2191]: info: Signal sent to pid=2183, waiting
> for process to exit
> Jun 20 11:08:29 cl1 attrd: [2192]: info: Invoked: /usr/lib/heartbeat/attrd
> Jun 20 11:08:29 cl1 pengine: [2193]: info: Invoked:
> /usr/lib/heartbeat/pengine
> Jun 20 11:08:29 cl1 crmd: [2194]: info: Invoked: /usr/lib/heartbeat/crmd
> Jun 20 11:08:29 cl1 cib: [2182]: info: G_main_add_TriggerHandler: Added
> signal manual handler
> Jun 20 11:08:29 cl1 attrd: [2184]: info: main: Starting up
> Jun 20 11:08:29 cl1 pengine: [2185]: info: main: Starting pengine
> Jun 20 11:08:29 cl1 crmd: [2186]: info: main: CRM Hg Version:
> 042548a451fce8400660f6031f4da6f0223dd5dd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Defaulting to 'no' for option: use_mgmtd
> Jun 20 11:08:29 cl1 cib: [2190]: info: G_main_add_TriggerHandler: Added
> signal manual handler
> Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Starting up
> Jun 20 11:08:29 cl1 pengine: [2193]: WARN: main: Terminating previous PE
> instance
> Jun 20 11:08:29 cl1 crmd: [2194]: info: main: CRM Hg Version:
> 042548a451fce8400660f6031f4da6f0223dd5dd
> Jun 20 11:08:29 cl1 cib: [2182]: info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> Jun 20 11:08:29 cl1 attrd: [2184]: info: crm_cluster_connect: Connecting
> to OpenAIS
> Jun 20 11:08:29 cl1 crmd: [2186]: info: crmd_init: Starting crmd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: CRM:
> Initialized
> Jun 20 11:08:29 cl1 cib: [2190]: info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_cluster_connect: Connecting
> to OpenAIS
> Jun 20 11:08:29 cl1 pengine: [2185]: WARN: process_pe_message: Received
> quit message, terminating
> Jun 20 11:08:29 cl1 crmd: [2194]: info: crmd_init: Starting crmd
> Jun 20 11:08:29 cl1 cib: [2182]: info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/h
> eartbeat/crm/cib.xml.sig)
> Jun 20 11:08:29 cl1 attrd: [2184]: info: init_ais_connection: Creating
> connection to our AIS plugin
> Jun 20 11:08:29 cl1 crmd: [2186]: info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] Logging: Initialized
> pcmk_startup
> Jun 20 11:08:29 cl1 cib: [2190]: info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/h
> eartbeat/crm/cib.xml.sig)
> Jun 20 11:08:29 cl1 attrd: [2192]: info: init_ais_connection: Creating
> connection to our AIS plugin
> Jun 20 11:08:29 cl1 crmd: [2194]: info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> Jun 20 11:08:29 cl1 cib: [2182]: info: startCib: CIB Initialization
> completed successfully
> Jun 20 11:08:29 cl1 attrd: [2184]: info: init_ais_connection: AIS
> connection established
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup:
> Maximum core file size is: 18446744073709551615
> Jun 20 11:08:29 cl1 cib: [2190]: info: startCib: CIB Initialization
> completed successfully
> Jun 20 11:08:29 cl1 attrd: [2192]: info: init_ais_connection: AIS
> connection established
> Jun 20 11:08:29 cl1 cib: [2182]: info: crm_cluster_connect: Connecting
> to OpenAIS
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup:
> Service: 9
> Jun 20 11:08:29 cl1 cib: [2190]: info: crm_cluster_connect: Connecting
> to OpenAIS
> Jun 20 11:08:29 cl1 attrd: [2192]: info: get_ais_nodeid: Server details:
> id=16777343 uname=cl1 cname=pcmk
> Jun 20 11:08:29 cl1 cib: [2182]: info: init_ais_connection: Creating
> connection to our AIS plugin
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: Local
> hostname: cl1
> Jun 20 11:08:29 cl1 cib: [2190]: info: init_ais_connection: Creating
> connection to our AIS plugin
> Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_new_peer: Node cl1 now has
> id: 16777343
> Jun 20 11:08:29 cl1 cib: [2182]: info: init_ais_connection: AIS
> connection established
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_update_nodeid:
> Local node id: 16777343
> Jun 20 11:08:29 cl1 cib: [2190]: info: init_ais_connection: AIS
> connection established
> Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_new_peer: Node 16777343 is
> now known as cl1
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member:
> Creating entry for node 16777343 born on 0
> Jun 20 11:08:29 cl1 cib: [2190]: info: get_ais_nodeid: Server details:
> id=16777343 uname=cl1 cname=pcmk
> Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Cluster connection active
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member:
> 0x23d8f00 Node 16777343 now known as cl1 (was: (null))
> Jun 20 11:08:29 cl1 cib: [2190]: info: crm_new_peer: Node cl1 now has
> id: 16777343
> Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Accepting attribute updates
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node
> cl1 now has 1 quorum votes (was 0)
> Jun 20 11:08:29 cl1 cib: [2190]: info: crm_new_peer: Node 16777343 is
> now known as cl1
> Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Starting mainloop...
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node
> 16777343/cl1 is now: member
> Jun 20 11:08:29 cl1 cib: [2190]: info: cib_init: Starting cib mainloop
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2181 for process stonithd
> Jun 20 11:08:29 cl1 cib: [2190]: info: ais_dispatch: Membership 24:
> quorum still lost
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2182 for process cib
> Jun 20 11:08:29 cl1 cib: [2190]: info: crm_update_peer: Node cl1:
> id=16777343 state=member (new) addr=r(0) ip(127.0.0.1)  (new) votes=1 (n
> ew) born=0 seen=24 proc=00000000000000000000000000013312 (new)
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2183 for process lrmd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2184 for process attrd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2185 for process pengine
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2186 for process crmd
> Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded:
> Pacemaker Cluster Manager 1.0.8
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: process_ais_conf:
> Reading configure
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_init:
> Local handle: 7114519016932114436 for logging
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_next:
> Processing additional logging options...
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Found 'off' for option: debug
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Found 'no' for option: to_logfile
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Found 'yes' for option: to_syslog
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Found 'daemon' for option: syslog_facility
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_init:
> Local handle: 4858364909567606789 for service
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_next:
> Processing additional service options...
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Defaulting to 'pcmk' for option: clustername
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Defaulting to 'no' for option: use_logd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt:
> Defaulting to 'no' for option: use_mgmtd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: CRM:
> Initialized
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] Logging: Initialized
> pcmk_startup
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup:
> Maximum core file size is: 18446744073709551615
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup:
> Service: 9
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: Local
> hostname: cl1
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2189 for process stonithd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2190 for process cib
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2191 for process lrmd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2192 for process attrd
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2193 for process pengine
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked
> child 2194 for process crmd
> Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded:
> Pacemaker Cluster Manager 1.0.8
> Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded:
> corosync extended virtual synchrony service
> Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded:
> corosync configuration service
> Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded:
> corosync cluster closed process group service v1.01
> Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded:
> corosync cluster config database access v1.01
> Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded:
> corosync profile loading service
> Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded:
> corosync cluster quorum service v0.1
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] notice: pcmk_peer_update:
> Transitional membership event on ring 24: memb=0, new=0, lost=0
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] notice: pcmk_peer_update:
> Stable membership event on ring 24: memb=1, new=1, lost=0
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member:
> Creating entry for node 16777343 born on 24
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node
> 16777343/unknown is now: member
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_peer_update:
> NEW:  .pending. 16777343
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_peer_update:
> MEMB: .pending. 16777343
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info:
> send_member_notification: Sending membership update 24 to 0 children
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node
> (null) now has process list: 00000000000000000000000000013312 (78
> 610)
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node
> (null) now has 1 quorum votes (was 0)
> Jun 20 11:08:29 cl1 corosync[2173]:   [TOTEM ] A processor joined or
> left the membership and a new membership was formed.
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member:
> 0x23e27d0 Node 16777343 now known as cl1 (was: (null))
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info:
> send_member_notification: Sending membership update 24 to 0 children
> Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] ERROR: pcmk_ipc: Child
> 2197 spawned to record non-fatal assertion failure line 961: transie
> nt || mutable->sender.pid == pcmk_children[type].pid
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] ERROR: pcmk_ipc: Sender:
> 2184, child[5]: 2192
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_ipc: Recorded
> connection 0x23e35c0 for attrd/2192
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] ERROR: pcmk_ipc: Child
> 2200 spawned to record non-fatal assertion failure line 961: transie
> nt || mutable->sender.pid == pcmk_children[type].pid
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] ERROR: pcmk_ipc: Sender:
> 2182, child[3]: 2190
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_ipc: Recorded
> connection 0x23e65a0 for cib/2190
> Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_ipc: Sending
> membership update 24 to cib
> Jun 20 11:08:29 cl1 cib: [2201]: info: write_cib_contents: Archived
> previous version as /var/lib/heartbeat/crm/cib-4.raw
> 
>  
> 
> The only real difference between these two machines is that the corosync
> communication runs over the 10.1.1.0 network, but machine one has a
> wireless adapter for this network and machine two has a wired network
> card for this network.
> 

firewall?

iptables?

run corosync-blackbox

are the wireless and wired networked interconnected?

is your ttl set properly?

I have used wireless and wired connections in a corosync cluster before
and although not recommended for deployment, they work for me.


Regards
-steve

>  
> 
> Could this be the problem ? Or do I have to look for something else ?
> 
> Would setting debug to "on" help me more ?
> 
> Hope someone can shed some light on this problem. I sure can't find
> anything about this using Google.....
> 
>  
> 
> Regards,
> 
>  
> 
> Hans
> 
> 
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to