Hi,

 

I just installed two servers, both with Ubuntu 10.04 64bit. No problems here.

The idea is to create a cluster that is going to server as a high available 
Zarafa environment.

One of the steps is to install corosync (which gets installed automatically 
when installing Pacemaker).

I have two NICs on both machines:

machine 1: one wired ethernetcard (eth0), ip 192.168.2.20

                     one wirelsess card (wlan0), ip 10.1.1.2

machine 2: one wired ethernetcard (eth0), ip 192.168.2.30

                     one wired ethernetcard (eth1), ip 10.1.1.3

 

The intention is to have corosync communicate on the 10.1.1.0 subnet. Therefore 
the subnet 10.1.1.0 is configured in corosync.conf on both machines.

 

On machine two, corosync seems to work. If I do a crm_mon i get  1 nodes 
configured, and Online [ cl2 ] (which is the hostname of the 2nd machine).

On the first machine, I don't seem to be able to start corosync normally. After 
starting corosync, I see the following in the daemon.log file:

 

Please, notice the errors at the end.

Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Corosync Cluster Engine 
('1.2.0'): started and ready to provide service. 
Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Corosync built-in features: nss 
Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Successfully read main 
configuration file '/etc/corosync/corosync.conf'. 
Jun 20 11:08:29 cl1 corosync[2173]:   [TOTEM ] Initializing transport (UDP/IP). 
Jun 20 11:08:29 cl1 corosync[2173]:   [TOTEM ] Initializing transmit/receive 
security: libtomcrypt SOBER128/SHA1HMAC (mode 0). 
Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Compatibility mode set to 
whitetank.  Using V1 and V2 of the synchronization engine. 
Jun 20 11:08:29 cl1 corosync[2173]:   [TOTEM ] The network interface is down. 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: process_ais_conf: Reading 
configure 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_init: Local 
handle: 5650605097994944514 for logging 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_next: 
Processing additional logging options... 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Found 
'off' for option: debug 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Found 'no' 
for option: to_logfile 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Found 
'yes' for option: to_syslog 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Found 
'daemon' for option: syslog_facility 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_init: Local 
handle: 2730409743423111171 for service 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_next: 
Processing additional service options... 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Defaulting 
to 'pcmk' for option: clustername 
Jun 20 11:08:29 cl1 stonithd: [2181]: notice: /usr/lib/heartbeat/stonithd is 
already running. 
Jun 20 11:08:29 cl1 cib: [2182]: info: Invoked: /usr/lib/heartbeat/cib 
Jun 20 11:08:29 cl1 attrd: [2184]: info: Invoked: /usr/lib/heartbeat/attrd 
Jun 20 11:08:29 cl1 pengine: [2185]: info: Invoked: /usr/lib/heartbeat/pengine 
Jun 20 11:08:29 cl1 crmd: [2186]: info: Invoked: /usr/lib/heartbeat/crmd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Defaulting 
to 'no' for option: use_logd 
Jun 20 11:08:29 cl1 stonithd: [2189]: notice: /usr/lib/heartbeat/stonithd is 
already running. 
Jun 20 11:08:29 cl1 cib: [2190]: info: Invoked: /usr/lib/heartbeat/cib 
Jun 20 11:08:29 cl1 lrmd: [2191]: info: Signal sent to pid=2183, waiting for 
process to exit 
Jun 20 11:08:29 cl1 attrd: [2192]: info: Invoked: /usr/lib/heartbeat/attrd 
Jun 20 11:08:29 cl1 pengine: [2193]: info: Invoked: /usr/lib/heartbeat/pengine 
Jun 20 11:08:29 cl1 crmd: [2194]: info: Invoked: /usr/lib/heartbeat/crmd 
Jun 20 11:08:29 cl1 cib: [2182]: info: G_main_add_TriggerHandler: Added signal 
manual handler 
Jun 20 11:08:29 cl1 attrd: [2184]: info: main: Starting up 
Jun 20 11:08:29 cl1 pengine: [2185]: info: main: Starting pengine 
Jun 20 11:08:29 cl1 crmd: [2186]: info: main: CRM Hg Version: 
042548a451fce8400660f6031f4da6f0223dd5dd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Defaulting 
to 'no' for option: use_mgmtd 
Jun 20 11:08:29 cl1 cib: [2190]: info: G_main_add_TriggerHandler: Added signal 
manual handler 
Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Starting up 
Jun 20 11:08:29 cl1 pengine: [2193]: WARN: main: Terminating previous PE 
instance 
Jun 20 11:08:29 cl1 crmd: [2194]: info: main: CRM Hg Version: 
042548a451fce8400660f6031f4da6f0223dd5dd 
Jun 20 11:08:29 cl1 cib: [2182]: info: G_main_add_SignalHandler: Added signal 
handler for signal 17 
Jun 20 11:08:29 cl1 attrd: [2184]: info: crm_cluster_connect: Connecting to 
OpenAIS 
Jun 20 11:08:29 cl1 crmd: [2186]: info: crmd_init: Starting crmd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: CRM: 
Initialized 
Jun 20 11:08:29 cl1 cib: [2190]: info: G_main_add_SignalHandler: Added signal 
handler for signal 17 
Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_cluster_connect: Connecting to 
OpenAIS 
Jun 20 11:08:29 cl1 pengine: [2185]: WARN: process_pe_message: Received quit 
message, terminating 
Jun 20 11:08:29 cl1 crmd: [2194]: info: crmd_init: Starting crmd 
Jun 20 11:08:29 cl1 cib: [2182]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/h 
eartbeat/crm/cib.xml.sig) 
Jun 20 11:08:29 cl1 attrd: [2184]: info: init_ais_connection: Creating 
connection to our AIS plugin 
Jun 20 11:08:29 cl1 crmd: [2186]: info: G_main_add_SignalHandler: Added signal 
handler for signal 17 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] Logging: Initialized 
pcmk_startup 
Jun 20 11:08:29 cl1 cib: [2190]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/h 
eartbeat/crm/cib.xml.sig) 
Jun 20 11:08:29 cl1 attrd: [2192]: info: init_ais_connection: Creating 
connection to our AIS plugin 
Jun 20 11:08:29 cl1 crmd: [2194]: info: G_main_add_SignalHandler: Added signal 
handler for signal 17 
Jun 20 11:08:29 cl1 cib: [2182]: info: startCib: CIB Initialization completed 
successfully 
Jun 20 11:08:29 cl1 attrd: [2184]: info: init_ais_connection: AIS connection 
established 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: Maximum core 
file size is: 18446744073709551615 
Jun 20 11:08:29 cl1 cib: [2190]: info: startCib: CIB Initialization completed 
successfully 
Jun 20 11:08:29 cl1 attrd: [2192]: info: init_ais_connection: AIS connection 
established 
Jun 20 11:08:29 cl1 cib: [2182]: info: crm_cluster_connect: Connecting to 
OpenAIS 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: Service: 9 
Jun 20 11:08:29 cl1 cib: [2190]: info: crm_cluster_connect: Connecting to 
OpenAIS 
Jun 20 11:08:29 cl1 attrd: [2192]: info: get_ais_nodeid: Server details: 
id=16777343 uname=cl1 cname=pcmk 
Jun 20 11:08:29 cl1 cib: [2182]: info: init_ais_connection: Creating connection 
to our AIS plugin 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: Local 
hostname: cl1 
Jun 20 11:08:29 cl1 cib: [2190]: info: init_ais_connection: Creating connection 
to our AIS plugin 
Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_new_peer: Node cl1 now has id: 
16777343 
Jun 20 11:08:29 cl1 cib: [2182]: info: init_ais_connection: AIS connection 
established 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_update_nodeid: Local 
node id: 16777343 
Jun 20 11:08:29 cl1 cib: [2190]: info: init_ais_connection: AIS connection 
established 
Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_new_peer: Node 16777343 is now 
known as cl1 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Creating 
entry for node 16777343 born on 0 
Jun 20 11:08:29 cl1 cib: [2190]: info: get_ais_nodeid: Server details: 
id=16777343 uname=cl1 cname=pcmk 
Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Cluster connection active 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: 0x23d8f00 
Node 16777343 now known as cl1 (was: (null)) 
Jun 20 11:08:29 cl1 cib: [2190]: info: crm_new_peer: Node cl1 now has id: 
16777343 
Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Accepting attribute updates 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node cl1 
now has 1 quorum votes (was 0) 
Jun 20 11:08:29 cl1 cib: [2190]: info: crm_new_peer: Node 16777343 is now known 
as cl1 
Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Starting mainloop... 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node 
16777343/cl1 is now: member 
Jun 20 11:08:29 cl1 cib: [2190]: info: cib_init: Starting cib mainloop 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2181 for process stonithd 
Jun 20 11:08:29 cl1 cib: [2190]: info: ais_dispatch: Membership 24: quorum 
still lost 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2182 for process cib 
Jun 20 11:08:29 cl1 cib: [2190]: info: crm_update_peer: Node cl1: id=16777343 
state=member (new) addr=r(0) ip(127.0.0.1)  (new) votes=1 (n 
ew) born=0 seen=24 proc=00000000000000000000000000013312 (new) 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2183 for process lrmd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2184 for process attrd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2185 for process pengine 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2186 for process crmd 
Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded: Pacemaker 
Cluster Manager 1.0.8 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: process_ais_conf: Reading 
configure 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_init: Local 
handle: 7114519016932114436 for logging 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_next: 
Processing additional logging options... 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Found 
'off' for option: debug 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Found 'no' 
for option: to_logfile 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Found 
'yes' for option: to_syslog 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Found 
'daemon' for option: syslog_facility 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_init: Local 
handle: 4858364909567606789 for service 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: config_find_next: 
Processing additional service options... 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Defaulting 
to 'pcmk' for option: clustername 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Defaulting 
to 'no' for option: use_logd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: get_config_opt: Defaulting 
to 'no' for option: use_mgmtd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: CRM: 
Initialized 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] Logging: Initialized 
pcmk_startup 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: Maximum core 
file size is: 18446744073709551615 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: Service: 9 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_startup: Local 
hostname: cl1 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2189 for process stonithd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2190 for process cib 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2191 for process lrmd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2192 for process attrd 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2193 for process pengine 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: spawn_child: Forked child 
2194 for process crmd 
Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded: Pacemaker 
Cluster Manager 1.0.8 
Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded: corosync 
extended virtual synchrony service 
Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded: corosync 
configuration service 
Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded: corosync 
cluster closed process group service v1.01 
Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded: corosync 
cluster config database access v1.01 
Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded: corosync 
profile loading service 
Jun 20 11:08:29 cl1 corosync[2173]:   [SERV  ] Service engine loaded: corosync 
cluster quorum service v0.1 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 24: memb=0, new=0, lost=0 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 24: memb=1, new=1, lost=0 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Creating 
entry for node 16777343 born on 24 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node 
16777343/unknown is now: member 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_peer_update: NEW:  
.pending. 16777343 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_peer_update: MEMB: 
.pending. 16777343 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: send_member_notification: 
Sending membership update 24 to 0 children 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node (null) 
now has process list: 00000000000000000000000000013312 (78 
610) 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: Node (null) 
now has 1 quorum votes (was 0) 
Jun 20 11:08:29 cl1 corosync[2173]:   [TOTEM ] A processor joined or left the 
membership and a new membership was formed. 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: update_member: 0x23e27d0 
Node 16777343 now known as cl1 (was: (null)) 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: send_member_notification: 
Sending membership update 24 to 0 children 
Jun 20 11:08:29 cl1 corosync[2173]:   [MAIN  ] Completed service 
synchronization, ready to provide service. 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] ERROR: pcmk_ipc: Child 2197 
spawned to record non-fatal assertion failure line 961: transie 
nt || mutable->sender.pid == pcmk_children[type].pid 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] ERROR: pcmk_ipc: Sender: 2184, 
child[5]: 2192 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x23e35c0 for attrd/2192 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] ERROR: pcmk_ipc: Child 2200 
spawned to record non-fatal assertion failure line 961: transie 
nt || mutable->sender.pid == pcmk_children[type].pid 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] ERROR: pcmk_ipc: Sender: 2182, 
child[3]: 2190 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x23e65a0 for cib/2190 
Jun 20 11:08:29 cl1 corosync[2173]:   [pcmk  ] info: pcmk_ipc: Sending 
membership update 24 to cib 
Jun 20 11:08:29 cl1 cib: [2201]: info: write_cib_contents: Archived previous 
version as /var/lib/heartbeat/crm/cib-4.raw

 

The only real difference between these two machines is that the corosync 
communication runs over the 10.1.1.0 network, but machine one has a wireless 
adapter for this network and machine two has a wired network card for this 
network.

 

Could this be the problem ? Or do I have to look for something else ?

Would setting debug to "on" help me more ?

Hope someone can shed some light on this problem. I sure can't find anything 
about this using Google.....

 

Regards,

 

Hans
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to