Hello all:
I am trying to get a corosync-openais-pacemaker cluster stack installed
and running on a 2 node cluster.
I am currently using the latest corosync and openais from the svn
repository (as of today, July 22)
The version of pacemaker is the 1.0 tip from July 19th.
Corosync has been crashing randomly on one of my nodes (The same one,
consistently). Today, after a crash, I shut down the cluster to upgrade
corosync hoping to resolve the problem. Now pacemaker is causing a
crash on startup, and I cannot start either node.
I have attached the debug output from one node in the cluster. Is there
something I can do to reset the cluster state so it will start? Or is
this a bug?
Now that Pacemaker will build against openais 1.0.0 and corosync 1.0.0,
should I downgrade to those versions instead of using the latest in SVN?
What do you recommend (version wise) as a stable stack?
Any help resolving this would be greatly appreciated.
Thanks!
Jonathan deBoer
Jul 22 21:18:22 corosync [MAIN ] main.c:717 The Platform is missing process
priority setting features. Leaving at default.
Jul 22 21:18:22 corosync [MAIN ] main.c:786 Corosync Cluster Engine ('trunk'):
started and ready to provide service.
Jul 22 21:18:22 corosync [MAIN ] main.c:867 Successfully configured openais
services to load
Jul 22 21:18:22 corosync [MAIN ] main.c:867 Successfully read main
configuration file '/usr/local/etc/corosync/corosync.conf'.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:780 Token Timeout (10000 ms)
retransmit timeout (495 ms)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:783 token hold (386 ms)
retransmits before loss (20 retrans)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:790 join (60 ms) send_join (0 ms)
consensus (4800 ms) merge (200 ms)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:793 downcheck (1000 ms) fail to
recv const (50 msgs)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:795 seqno unchanged const (30
rotations) Maximum network MTU 1500
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:799 window size per rotation (50
messages) maximum messages per rotation (20 messages)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:802 send threads (0 threads)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:805 RRP token expired timeout (495
ms)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:808 RRP token problem counter
(2000 ms)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:811 RRP threshold (10 problem
count)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:813 RRP mode set to none.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:816 heartbeat_failures_allowed (0)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:818 max_network_delay (50 ms)
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:841 HeartBeat is Disabled. To
enable set heartbeat_failures_allowed > 0
Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:319 Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
cluster membership service B.01.01'
Jul 22 21:18:22 corosync [EVT ] evt.c:3107 Evt exec init request
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
event service B.01.01'
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
checkpoint service B.01.01'
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
availability management framework B.01.01'
Jul 22 21:18:22 corosync [MSG ] msg.c:2404 [DEBUG]: msg_exec_init_fn
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
message service B.03.01'
Jul 22 21:18:22 corosync [LCK ] lck.c:1472 [DEBUG]: lck_exec_init_fn
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
distributed locking service B.03.01'
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
timer service A.01.01'
Jul 22 21:18:22 corosync [pcmk ] plugin.c:313 info: process_ais_conf: Reading
configure
Jul 22 21:18:22 corosync [pcmk ] utils.c:547 info: config_find_init: Local
handle: 5912924842687987714 for logging
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional logging options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:599 info: get_config_opt: Found 'on'
for option: debug
Jul 22 21:18:22 corosync [pcmk ] utils.c:613 info: get_config_opt: Defaulting
to 'off' for option: to_file
Jul 22 21:18:22 corosync [pcmk ] utils.c:599 info: get_config_opt: Found
'daemon' for option: syslog_facility
Jul 22 21:18:22 corosync [pcmk ] utils.c:547 info: config_find_init: Local
handle: 3984067077437652995 for service
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional service options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional service options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional service options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional service options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional service options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional service options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional service options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
Processing additional service options...
Jul 22 21:18:22 corosync [pcmk ] utils.c:613 info: get_config_opt: Defaulting
to 'no' for option: use_logd
Jul 22 21:18:22 corosync [pcmk ] utils.c:613 info: get_config_opt: Defaulting
to 'no' for option: use_mgmtd
Jul 22 21:18:22 corosync [pcmk ] plugin.c:404 info: pcmk_plugin_init: CRM:
Initialized
Jul 22 21:18:22 corosync [pcmk ] plugin.c:405 Logging: Initialized
pcmk_plugin_init
Jul 22 21:18:22 corosync [pcmk ] plugin.c:418 info: pcmk_plugin_init: Service:
9
Jul 22 21:18:22 corosync [pcmk ] plugin.c:419 info: pcmk_plugin_init: Local
node id: 0
Jul 22 21:18:22 corosync [pcmk ] plugin.c:420 info: pcmk_plugin_init: Local
hostname: Aries
Jul 22 21:18:22 corosync [pcmk ] utils.c:234 info: update_member: Creating
entry for node 0 born on 0
Jul 22 21:18:22 corosync [pcmk ] utils.c:261 info: update_member: 0x8eb92e8
Node 0 now known as Aries (was: (null))
Jul 22 21:18:22 corosync [pcmk ] utils.c:277 info: update_member: Node Aries
now has 1 quorum votes (was 0)
Jul 22 21:18:22 corosync [pcmk ] utils.c:287 info: update_member: Node 0/Aries
is now: member
Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
9370 for process stonithd
Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
9371 for process cib
Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
9372 for process lrmd
Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
9373 for process attrd
Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
9374 for process pengine
Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
9375 for process crmd
Jul 22 21:18:22 corosync [pcmk ] plugin.c:540 info: pcmk_startup: CRM:
Initialized
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'Pacemaker
Cluster Manager'
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
extended virtual synchrony service'
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
configuration service'
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
cluster closed process group service v1.01'
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
cluster config database access v1.01'
Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
profile loading service'
Jul 22 21:18:22 corosync [MAIN ] main.c:1010 Compatibility mode set to
whitetank. Using V1 and V2 of the synchronization engine.
Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1531 Receive multicast socket recv
buffer size (262142 bytes).
Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1537 Transmit multicast socket
send buffer size (262142 bytes).
Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1355 The network interface
[172.29.1.1] is now up.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4193 Created or loaded sequence id
740.172.29.1.1 for this ring.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1786 entering GATHER state from 15.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:2768 Creating commit token because
I am the rep.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1337 Saving state aru 0 high seq
received 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3004 Storing new sequence id for
ring 2e8
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1825 entering COMMIT state.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4060 got commit token
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1857 entering RECOVERY state.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1886 position [0] member
172.29.1.1:
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1890 previous ring seq 740 rep
172.29.1.1
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1896 aru 0 high delivered 0
received flag 1
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:2003 Did not need to originate any
messages in recovery.
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4060 got commit token
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4114 Sending initial ORF token
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my
set retrans flag0 retrans queue empty 1 count 0, aru 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high seq
received 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my
set retrans flag0 retrans queue empty 1 count 1, aru 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high seq
received 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my
set retrans flag0 retrans queue empty 1 count 2, aru 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high seq
received 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my
set retrans flag0 retrans queue empty 1 count 3, aru 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high seq
received 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3442 retrans flag count 4 token
aru 0 install seq 0 aru 0 0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1558 recovery to regular 1-0
Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1643 Delivering to app 1 to 0
Jul 22 21:18:22 corosync [CLM ] clm.c:564 CLM CONFIGURATION CHANGE
Jul 22 21:18:22 corosync [CLM ] clm.c:565 New Configuration:
Jul 22 21:18:22 corosync [CLM ] clm.c:569 Members Left:
Jul 22 21:18:22 corosync [CLM ] clm.c:574 Members Joined:
Jul 22 21:18:22 corosync [EVT ] evt.c:2918 Evt conf change 1
Jul 22 21:18:22 corosync [EVT ] evt.c:2922 m 0, j 0 l 0
Jul 22 21:18:22 corosync [LCK ] lck.c:841 [DEBUG]: lck_confchg_fn
Jul 22 21:18:22 corosync [MSG ] msg.c:1085 [DEBUG]: msg_confchg_fn
Jul 22 21:18:22 corosync [pcmk ] plugin.c:633 notice: pcmk_peer_update:
Transitional membership event on ring 744: memb=0, new=0, lost=0
Jul 22 21:18:22 corosync [CLM ] clm.c:564 CLM CONFIGURATION CHANGE
Jul 22 21:18:22 corosync [CLM ] clm.c:565 New Configuration:
Jul 22 21:18:22 corosync [CLM ] clm.c:567 no interface found for nodeid
Jul 22 21:18:22 corosync [CLM ] clm.c:569 Members Left:
Jul 22 21:18:22 corosync [CLM ] clm.c:574 Members Joined:
Jul 22 21:18:22 corosync [CLM ] clm.c:576 no interface found for nodeid
Jul 22 21:18:22 corosync [EVT ] evt.c:2918 Evt conf change 0
Jul 22 21:18:22 corosync [EVT ] evt.c:2922 m 1, j 1 l 0
Jul 22 21:18:22 corosync [LCK ] lck.c:841 [DEBUG]: lck_confchg_fn
Jul 22 21:18:22 corosync [MSG ] msg.c:1085 [DEBUG]: msg_confchg_fn
Jul 22 21:18:22 corosync [pcmk ] plugin.c:633 notice: pcmk_peer_update: Stable
membership event on ring 744: memb=1, new=1, lost=0
Jul 22 21:18:22 corosync [pcmk ] utils.c:234 info: update_member: Creating
entry for node 16850348 born on 744
Jul 22 21:18:22 corosync [pcmk ] utils.c:287 info: update_member: Node
16850348/unknown is now: member
Jul 22 21:18:22 corosync [pcmk ] plugin.c:661 info: pcmk_peer_update: NEW:
.pending. 16850348
Jul 22 21:18:22 corosync [pcmk ] plugin.c:667 debug: pcmk_peer_update: Node
16850348 has address no interface found for nodeid
Jul 22 21:18:22 corosync [pcmk ] plugin.c:679 info: pcmk_peer_update: MEMB:
.pending. 16850348
Jul 22 21:18:22 corosync [pcmk ] plugin.c:596 info: ais_mark_unseen_peer_dead:
Node Aries was not seen in the previous transition
Jul 22 21:18:22 corosync [pcmk ] utils.c:287 info: update_member: Node 0/Aries
is now: lost
Jul 22 21:18:22 corosync [pcmk ] plugin.c:712 debug: pcmk_peer_update: 2 nodes
changed
Jul 22 21:18:22 corosync [pcmk ] plugin.c:1187 info: send_member_notification:
Sending membership update 744 to 0 children
Jul 22 21:18:22 corosync [pcmk ] plugin.c:1439 CRIT: send_cluster_id:
Assertion failure line 1439: local_nodeid != 0
/usr/local/sbin/aisexec: line 3: 9365 Aborted (core dumped)
corosync "$@"
_______________________________________________
Pacemaker mailing list
[email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker