Hi,
I just installed two servers, both with Ubuntu 10.04 64bit. No problems here.
The idea is to create a cluster that is going to server as a high available
Zarafa environment.
One of the steps is to install corosync (which gets installed automatically
when installing Pacemaker).
I have two NICs on both machines:
machine 1: one wired ethernetcard (eth0), ip 192.168.2.20
one wirelsess card (wlan0), ip 10.1.1.2
machine 2: one wired ethernetcard (eth0), ip 192.168.2.30
one wired ethernetcard (eth1), ip 10.1.1.3
The intention is to have corosync communicate on the 10.1.1.0 subnet. Therefore
the subnet 10.1.1.0 is configured in corosync.conf on both machines.
On machine two, corosync seems to work. If I do a crm_mon i get 1 nodes
configured, and Online [ cl2 ] (which is the hostname of the 2nd machine).
On the first machine, I don't seem to be able to start corosync normally. After
starting corosync, I see the following in the daemon.log file:
Please, notice the errors at the end.
Jun 20 11:08:29 cl1 corosync[2173]: [MAIN ] Corosync Cluster Engine
('1.2.0'): started and ready to provide service.
Jun 20 11:08:29 cl1 corosync[2173]: [MAIN ] Corosync built-in features: nss
Jun 20 11:08:29 cl1 corosync[2173]: [MAIN ] Successfully read main
configuration file '/etc/corosync/corosync.conf'.
Jun 20 11:08:29 cl1 corosync[2173]: [TOTEM ] Initializing transport (UDP/IP).
Jun 20 11:08:29 cl1 corosync[2173]: [TOTEM ] Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 20 11:08:29 cl1 corosync[2173]: [MAIN ] Compatibility mode set to
whitetank. Using V1 and V2 of the synchronization engine.
Jun 20 11:08:29 cl1 corosync[2173]: [TOTEM ] The network interface is down.
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: process_ais_conf: Reading
configure
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: config_find_init: Local
handle: 5650605097994944514 for logging
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: config_find_next:
Processing additional logging options...
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Found
'off' for option: debug
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Found 'no'
for option: to_logfile
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Found
'yes' for option: to_syslog
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Found
'daemon' for option: syslog_facility
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: config_find_init: Local
handle: 2730409743423111171 for service
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: config_find_next:
Processing additional service options...
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Defaulting
to 'pcmk' for option: clustername
Jun 20 11:08:29 cl1 stonithd: [2181]: notice: /usr/lib/heartbeat/stonithd is
already running.
Jun 20 11:08:29 cl1 cib: [2182]: info: Invoked: /usr/lib/heartbeat/cib
Jun 20 11:08:29 cl1 attrd: [2184]: info: Invoked: /usr/lib/heartbeat/attrd
Jun 20 11:08:29 cl1 pengine: [2185]: info: Invoked: /usr/lib/heartbeat/pengine
Jun 20 11:08:29 cl1 crmd: [2186]: info: Invoked: /usr/lib/heartbeat/crmd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Defaulting
to 'no' for option: use_logd
Jun 20 11:08:29 cl1 stonithd: [2189]: notice: /usr/lib/heartbeat/stonithd is
already running.
Jun 20 11:08:29 cl1 cib: [2190]: info: Invoked: /usr/lib/heartbeat/cib
Jun 20 11:08:29 cl1 lrmd: [2191]: info: Signal sent to pid=2183, waiting for
process to exit
Jun 20 11:08:29 cl1 attrd: [2192]: info: Invoked: /usr/lib/heartbeat/attrd
Jun 20 11:08:29 cl1 pengine: [2193]: info: Invoked: /usr/lib/heartbeat/pengine
Jun 20 11:08:29 cl1 crmd: [2194]: info: Invoked: /usr/lib/heartbeat/crmd
Jun 20 11:08:29 cl1 cib: [2182]: info: G_main_add_TriggerHandler: Added signal
manual handler
Jun 20 11:08:29 cl1 attrd: [2184]: info: main: Starting up
Jun 20 11:08:29 cl1 pengine: [2185]: info: main: Starting pengine
Jun 20 11:08:29 cl1 crmd: [2186]: info: main: CRM Hg Version:
042548a451fce8400660f6031f4da6f0223dd5dd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Defaulting
to 'no' for option: use_mgmtd
Jun 20 11:08:29 cl1 cib: [2190]: info: G_main_add_TriggerHandler: Added signal
manual handler
Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Starting up
Jun 20 11:08:29 cl1 pengine: [2193]: WARN: main: Terminating previous PE
instance
Jun 20 11:08:29 cl1 crmd: [2194]: info: main: CRM Hg Version:
042548a451fce8400660f6031f4da6f0223dd5dd
Jun 20 11:08:29 cl1 cib: [2182]: info: G_main_add_SignalHandler: Added signal
handler for signal 17
Jun 20 11:08:29 cl1 attrd: [2184]: info: crm_cluster_connect: Connecting to
OpenAIS
Jun 20 11:08:29 cl1 crmd: [2186]: info: crmd_init: Starting crmd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_startup: CRM:
Initialized
Jun 20 11:08:29 cl1 cib: [2190]: info: G_main_add_SignalHandler: Added signal
handler for signal 17
Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_cluster_connect: Connecting to
OpenAIS
Jun 20 11:08:29 cl1 pengine: [2185]: WARN: process_pe_message: Received quit
message, terminating
Jun 20 11:08:29 cl1 crmd: [2194]: info: crmd_init: Starting crmd
Jun 20 11:08:29 cl1 cib: [2182]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/h
eartbeat/crm/cib.xml.sig)
Jun 20 11:08:29 cl1 attrd: [2184]: info: init_ais_connection: Creating
connection to our AIS plugin
Jun 20 11:08:29 cl1 crmd: [2186]: info: G_main_add_SignalHandler: Added signal
handler for signal 17
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] Logging: Initialized
pcmk_startup
Jun 20 11:08:29 cl1 cib: [2190]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/h
eartbeat/crm/cib.xml.sig)
Jun 20 11:08:29 cl1 attrd: [2192]: info: init_ais_connection: Creating
connection to our AIS plugin
Jun 20 11:08:29 cl1 crmd: [2194]: info: G_main_add_SignalHandler: Added signal
handler for signal 17
Jun 20 11:08:29 cl1 cib: [2182]: info: startCib: CIB Initialization completed
successfully
Jun 20 11:08:29 cl1 attrd: [2184]: info: init_ais_connection: AIS connection
established
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_startup: Maximum core
file size is: 18446744073709551615
Jun 20 11:08:29 cl1 cib: [2190]: info: startCib: CIB Initialization completed
successfully
Jun 20 11:08:29 cl1 attrd: [2192]: info: init_ais_connection: AIS connection
established
Jun 20 11:08:29 cl1 cib: [2182]: info: crm_cluster_connect: Connecting to
OpenAIS
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_startup: Service: 9
Jun 20 11:08:29 cl1 cib: [2190]: info: crm_cluster_connect: Connecting to
OpenAIS
Jun 20 11:08:29 cl1 attrd: [2192]: info: get_ais_nodeid: Server details:
id=16777343 uname=cl1 cname=pcmk
Jun 20 11:08:29 cl1 cib: [2182]: info: init_ais_connection: Creating connection
to our AIS plugin
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_startup: Local
hostname: cl1
Jun 20 11:08:29 cl1 cib: [2190]: info: init_ais_connection: Creating connection
to our AIS plugin
Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_new_peer: Node cl1 now has id:
16777343
Jun 20 11:08:29 cl1 cib: [2182]: info: init_ais_connection: AIS connection
established
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_update_nodeid: Local
node id: 16777343
Jun 20 11:08:29 cl1 cib: [2190]: info: init_ais_connection: AIS connection
established
Jun 20 11:08:29 cl1 attrd: [2192]: info: crm_new_peer: Node 16777343 is now
known as cl1
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: Creating
entry for node 16777343 born on 0
Jun 20 11:08:29 cl1 cib: [2190]: info: get_ais_nodeid: Server details:
id=16777343 uname=cl1 cname=pcmk
Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Cluster connection active
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: 0x23d8f00
Node 16777343 now known as cl1 (was: (null))
Jun 20 11:08:29 cl1 cib: [2190]: info: crm_new_peer: Node cl1 now has id:
16777343
Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Accepting attribute updates
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: Node cl1
now has 1 quorum votes (was 0)
Jun 20 11:08:29 cl1 cib: [2190]: info: crm_new_peer: Node 16777343 is now known
as cl1
Jun 20 11:08:29 cl1 attrd: [2192]: info: main: Starting mainloop...
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: Node
16777343/cl1 is now: member
Jun 20 11:08:29 cl1 cib: [2190]: info: cib_init: Starting cib mainloop
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2181 for process stonithd
Jun 20 11:08:29 cl1 cib: [2190]: info: ais_dispatch: Membership 24: quorum
still lost
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2182 for process cib
Jun 20 11:08:29 cl1 cib: [2190]: info: crm_update_peer: Node cl1: id=16777343
state=member (new) addr=r(0) ip(127.0.0.1) (new) votes=1 (n
ew) born=0 seen=24 proc=00000000000000000000000000013312 (new)
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2183 for process lrmd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2184 for process attrd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2185 for process pengine
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2186 for process crmd
Jun 20 11:08:29 cl1 corosync[2173]: [SERV ] Service engine loaded: Pacemaker
Cluster Manager 1.0.8
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: process_ais_conf: Reading
configure
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: config_find_init: Local
handle: 7114519016932114436 for logging
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: config_find_next:
Processing additional logging options...
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Found
'off' for option: debug
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Found 'no'
for option: to_logfile
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Found
'yes' for option: to_syslog
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Found
'daemon' for option: syslog_facility
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: config_find_init: Local
handle: 4858364909567606789 for service
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: config_find_next:
Processing additional service options...
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Defaulting
to 'pcmk' for option: clustername
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Defaulting
to 'no' for option: use_logd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: get_config_opt: Defaulting
to 'no' for option: use_mgmtd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_startup: CRM:
Initialized
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] Logging: Initialized
pcmk_startup
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_startup: Maximum core
file size is: 18446744073709551615
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_startup: Service: 9
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_startup: Local
hostname: cl1
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2189 for process stonithd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2190 for process cib
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2191 for process lrmd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2192 for process attrd
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2193 for process pengine
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: spawn_child: Forked child
2194 for process crmd
Jun 20 11:08:29 cl1 corosync[2173]: [SERV ] Service engine loaded: Pacemaker
Cluster Manager 1.0.8
Jun 20 11:08:29 cl1 corosync[2173]: [SERV ] Service engine loaded: corosync
extended virtual synchrony service
Jun 20 11:08:29 cl1 corosync[2173]: [SERV ] Service engine loaded: corosync
configuration service
Jun 20 11:08:29 cl1 corosync[2173]: [SERV ] Service engine loaded: corosync
cluster closed process group service v1.01
Jun 20 11:08:29 cl1 corosync[2173]: [SERV ] Service engine loaded: corosync
cluster config database access v1.01
Jun 20 11:08:29 cl1 corosync[2173]: [SERV ] Service engine loaded: corosync
profile loading service
Jun 20 11:08:29 cl1 corosync[2173]: [SERV ] Service engine loaded: corosync
cluster quorum service v0.1
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] notice: pcmk_peer_update:
Transitional membership event on ring 24: memb=0, new=0, lost=0
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] notice: pcmk_peer_update: Stable
membership event on ring 24: memb=1, new=1, lost=0
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: Creating
entry for node 16777343 born on 24
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: Node
16777343/unknown is now: member
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_peer_update: NEW:
.pending. 16777343
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_peer_update: MEMB:
.pending. 16777343
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: send_member_notification:
Sending membership update 24 to 0 children
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: Node (null)
now has process list: 00000000000000000000000000013312 (78
610)
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: Node (null)
now has 1 quorum votes (was 0)
Jun 20 11:08:29 cl1 corosync[2173]: [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: update_member: 0x23e27d0
Node 16777343 now known as cl1 (was: (null))
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: send_member_notification:
Sending membership update 24 to 0 children
Jun 20 11:08:29 cl1 corosync[2173]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] ERROR: pcmk_ipc: Child 2197
spawned to record non-fatal assertion failure line 961: transie
nt || mutable->sender.pid == pcmk_children[type].pid
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] ERROR: pcmk_ipc: Sender: 2184,
child[5]: 2192
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x23e35c0 for attrd/2192
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] ERROR: pcmk_ipc: Child 2200
spawned to record non-fatal assertion failure line 961: transie
nt || mutable->sender.pid == pcmk_children[type].pid
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] ERROR: pcmk_ipc: Sender: 2182,
child[3]: 2190
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x23e65a0 for cib/2190
Jun 20 11:08:29 cl1 corosync[2173]: [pcmk ] info: pcmk_ipc: Sending
membership update 24 to cib
Jun 20 11:08:29 cl1 cib: [2201]: info: write_cib_contents: Archived previous
version as /var/lib/heartbeat/crm/cib-4.raw
The only real difference between these two machines is that the corosync
communication runs over the 10.1.1.0 network, but machine one has a wireless
adapter for this network and machine two has a wired network card for this
network.
Could this be the problem ? Or do I have to look for something else ?
Would setting debug to "on" help me more ?
Hope someone can shed some light on this problem. I sure can't find anything
about this using Google.....
Regards,
Hans
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais