In this backtrace pacemaker is asserting.
I suggest mailing the pacemaker list with your question.
I also suggest using corosync 1.0.0 release, since trunk is under
development.
Regards
-steve
On Thu, 2009-07-23 at 18:28 -0600, Jonathan wrote:
> Hello all:
>
> I am trying to get a corosync-openais-pacemaker cluster stack installed
> and running on a 2 node cluster.
>
> I am currently using the latest corosync and openais from the svn
> repository (as of July 22)
> The version of pacemaker is the 1.0 tip from Today (July 23rd)
>
> Corosync has been crashing randomly on one of my nodes (The same one,
> consistently).After a crash, I shut down the cluster to upgrade
> corosync hoping to resolve the problem. Now pacemaker is causing a
> crash on startup, and I cannot start either node.
>
> I have attached the debug output from one node in the cluster, and a
> backtrace from the crash in GDB.
>
> Is there something I can do to reset the cluster state so it will start? Or
> is this a bug?
>
> Thanks!
>
> Jonathan
>
> plain text document attachment (crash.log)
> Jul 22 21:18:22 corosync [MAIN ] main.c:717 The Platform is missing process
> priority setting features. Leaving at default.
> Jul 22 21:18:22 corosync [MAIN ] main.c:786 Corosync Cluster Engine
> ('trunk'): started and ready to provide service.
> Jul 22 21:18:22 corosync [MAIN ] main.c:867 Successfully configured openais
> services to load
> Jul 22 21:18:22 corosync [MAIN ] main.c:867 Successfully read main
> configuration file '/usr/local/etc/corosync/corosync.conf'.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:780 Token Timeout (10000 ms)
> retransmit timeout (495 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:783 token hold (386 ms)
> retransmits before loss (20 retrans)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:790 join (60 ms) send_join (0
> ms) consensus (4800 ms) merge (200 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:793 downcheck (1000 ms) fail to
> recv const (50 msgs)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:795 seqno unchanged const (30
> rotations) Maximum network MTU 1500
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:799 window size per rotation (50
> messages) maximum messages per rotation (20 messages)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:802 send threads (0 threads)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:805 RRP token expired timeout
> (495 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:808 RRP token problem counter
> (2000 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:811 RRP threshold (10 problem
> count)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:813 RRP mode set to none.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:816 heartbeat_failures_allowed
> (0)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:818 max_network_delay (50 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:841 HeartBeat is Disabled. To
> enable set heartbeat_failures_allowed > 0
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:319 Initializing
> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
> cluster membership service B.01.01'
> Jul 22 21:18:22 corosync [EVT ] evt.c:3107 Evt exec init request
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
> event service B.01.01'
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
> checkpoint service B.01.01'
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
> availability management framework B.01.01'
> Jul 22 21:18:22 corosync [MSG ] msg.c:2404 [DEBUG]: msg_exec_init_fn
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
> message service B.03.01'
> Jul 22 21:18:22 corosync [LCK ] lck.c:1472 [DEBUG]: lck_exec_init_fn
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
> distributed locking service B.03.01'
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'openais
> timer service A.01.01'
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:313 info: process_ais_conf:
> Reading configure
> Jul 22 21:18:22 corosync [pcmk ] utils.c:547 info: config_find_init: Local
> handle: 5912924842687987714 for logging
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional logging options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:599 info: get_config_opt: Found
> 'on' for option: debug
> Jul 22 21:18:22 corosync [pcmk ] utils.c:613 info: get_config_opt:
> Defaulting to 'off' for option: to_file
> Jul 22 21:18:22 corosync [pcmk ] utils.c:599 info: get_config_opt: Found
> 'daemon' for option: syslog_facility
> Jul 22 21:18:22 corosync [pcmk ] utils.c:547 info: config_find_init: Local
> handle: 3984067077437652995 for service
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:573 info: config_find_next:
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk ] utils.c:613 info: get_config_opt:
> Defaulting to 'no' for option: use_logd
> Jul 22 21:18:22 corosync [pcmk ] utils.c:613 info: get_config_opt:
> Defaulting to 'no' for option: use_mgmtd
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:404 info: pcmk_plugin_init: CRM:
> Initialized
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:405 Logging: Initialized
> pcmk_plugin_init
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:418 info: pcmk_plugin_init:
> Service: 9
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:419 info: pcmk_plugin_init: Local
> node id: 0
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:420 info: pcmk_plugin_init: Local
> hostname: Aries
> Jul 22 21:18:22 corosync [pcmk ] utils.c:234 info: update_member: Creating
> entry for node 0 born on 0
> Jul 22 21:18:22 corosync [pcmk ] utils.c:261 info: update_member: 0x8eb92e8
> Node 0 now known as Aries (was: (null))
> Jul 22 21:18:22 corosync [pcmk ] utils.c:277 info: update_member: Node Aries
> now has 1 quorum votes (was 0)
> Jul 22 21:18:22 corosync [pcmk ] utils.c:287 info: update_member: Node
> 0/Aries is now: member
> Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
> 9370 for process stonithd
> Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
> 9371 for process cib
> Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
> 9372 for process lrmd
> Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
> 9373 for process attrd
> Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
> 9374 for process pengine
> Jul 22 21:18:22 corosync [pcmk ] utils.c:143 info: spawn_child: Forked child
> 9375 for process crmd
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:540 info: pcmk_startup: CRM:
> Initialized
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized
> 'Pacemaker Cluster Manager'
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
> extended virtual synchrony service'
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
> configuration service'
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
> cluster closed process group service v1.01'
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
> cluster config database access v1.01'
> Jul 22 21:18:22 corosync [SERV ] service.c:206 Service initialized 'corosync
> profile loading service'
> Jul 22 21:18:22 corosync [MAIN ] main.c:1010 Compatibility mode set to
> whitetank. Using V1 and V2 of the synchronization engine.
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1531 Receive multicast socket
> recv buffer size (262142 bytes).
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1537 Transmit multicast socket
> send buffer size (262142 bytes).
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1355 The network interface
> [172.29.1.1] is now up.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4193 Created or loaded sequence
> id 740.172.29.1.1 for this ring.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1786 entering GATHER state from
> 15.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:2768 Creating commit token
> because I am the rep.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1337 Saving state aru 0 high seq
> received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3004 Storing new sequence id for
> ring 2e8
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1825 entering COMMIT state.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4060 got commit token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1857 entering RECOVERY state.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1886 position [0] member
> 172.29.1.1:
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1890 previous ring seq 740 rep
> 172.29.1.1
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1896 aru 0 high delivered 0
> received flag 1
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:2003 Did not need to originate
> any messages in recovery.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4060 got commit token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4114 Sending initial ORF token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my
> set retrans flag0 retrans queue empty 1 count 0, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high
> seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my
> set retrans flag0 retrans queue empty 1 count 1, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high
> seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my
> set retrans flag0 retrans queue empty 1 count 2, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high
> seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my
> set retrans flag0 retrans queue empty 1 count 3, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high
> seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3442 retrans flag count 4 token
> aru 0 install seq 0 aru 0 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1558 recovery to regular 1-0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1643 Delivering to app 1 to 0
> Jul 22 21:18:22 corosync [CLM ] clm.c:564 CLM CONFIGURATION CHANGE
> Jul 22 21:18:22 corosync [CLM ] clm.c:565 New Configuration:
> Jul 22 21:18:22 corosync [CLM ] clm.c:569 Members Left:
> Jul 22 21:18:22 corosync [CLM ] clm.c:574 Members Joined:
> Jul 22 21:18:22 corosync [EVT ] evt.c:2918 Evt conf change 1
> Jul 22 21:18:22 corosync [EVT ] evt.c:2922 m 0, j 0 l 0
> Jul 22 21:18:22 corosync [LCK ] lck.c:841 [DEBUG]: lck_confchg_fn
> Jul 22 21:18:22 corosync [MSG ] msg.c:1085 [DEBUG]: msg_confchg_fn
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:633 notice: pcmk_peer_update:
> Transitional membership event on ring 744: memb=0, new=0, lost=0
> Jul 22 21:18:22 corosync [CLM ] clm.c:564 CLM CONFIGURATION CHANGE
> Jul 22 21:18:22 corosync [CLM ] clm.c:565 New Configuration:
> Jul 22 21:18:22 corosync [CLM ] clm.c:567 no interface found for nodeid
> Jul 22 21:18:22 corosync [CLM ] clm.c:569 Members Left:
> Jul 22 21:18:22 corosync [CLM ] clm.c:574 Members Joined:
> Jul 22 21:18:22 corosync [CLM ] clm.c:576 no interface found for nodeid
> Jul 22 21:18:22 corosync [EVT ] evt.c:2918 Evt conf change 0
> Jul 22 21:18:22 corosync [EVT ] evt.c:2922 m 1, j 1 l 0
> Jul 22 21:18:22 corosync [LCK ] lck.c:841 [DEBUG]: lck_confchg_fn
> Jul 22 21:18:22 corosync [MSG ] msg.c:1085 [DEBUG]: msg_confchg_fn
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:633 notice: pcmk_peer_update:
> Stable membership event on ring 744: memb=1, new=1, lost=0
> Jul 22 21:18:22 corosync [pcmk ] utils.c:234 info: update_member: Creating
> entry for node 16850348 born on 744
> Jul 22 21:18:22 corosync [pcmk ] utils.c:287 info: update_member: Node
> 16850348/unknown is now: member
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:661 info: pcmk_peer_update: NEW:
> .pending. 16850348
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:667 debug: pcmk_peer_update: Node
> 16850348 has address no interface found for nodeid
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:679 info: pcmk_peer_update: MEMB:
> .pending. 16850348
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:596 info:
> ais_mark_unseen_peer_dead: Node Aries was not seen in the previous transition
> Jul 22 21:18:22 corosync [pcmk ] utils.c:287 info: update_member: Node
> 0/Aries is now: lost
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:712 debug: pcmk_peer_update: 2
> nodes changed
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:1187 info:
> send_member_notification: Sending membership update 744 to 0 children
> Jul 22 21:18:22 corosync [pcmk ] plugin.c:1439 CRIT: send_cluster_id:
> Assertion failure line 1439: local_nodeid != 0
> /usr/local/sbin/aisexec: line 3: 9365 Aborted (core dumped)
> corosync "$@"
>
> plain text document attachment (debug.session)
> Jul 23 17:18:41 corosync [pcmk ] plugin.c:679 info: pcmk_peer_update: MEMB:
> .pending. 16850348
> Jul 23 17:18:41 corosync [pcmk ] plugin.c:596 info:
> ais_mark_unseen_peer_dead: Node Aries was not seen in the previous transition
> Jul 23 17:18:41 corosync [pcmk ] utils.c:287 info: update_member: Node
> 0/Aries is now: lost
> Jul 23 17:18:41 corosync [pcmk ] plugin.c:712 debug: pcmk_peer_update: 2
> nodes changed
> Jul 23 17:18:41 corosync [pcmk ] plugin.c:1187 info:
> send_member_notification: Sending membership update 764 to 0 children
> Jul 23 17:18:41 corosync [pcmk ] plugin.c:1439 CRIT: send_cluster_id:
> Assertion failure line 1439: local_nodeid != 0
>
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0xb7c726c0 (LWP 10526)]
> 0xb7fc8424 in __kernel_vsyscall ()
> (gdb) bt
> #0 0xb7fc8424 in __kernel_vsyscall ()
> #1 0xb7e4fb91 in raise () from /lib/libc.so.6
> #2 0xb7e51378 in abort () from /lib/libc.so.6
> #3 0xb722f2d5 in send_cluster_id () from
> /usr/local/libexec/lcrso/pacemaker.lcrso
> #4 0xb722b362 in pcmk_peer_update () from
> /usr/local/libexec/lcrso/pacemaker.lcrso
> #5 0x0804be0e in confchg_fn (configuration_type=TOTEM_CONFIGURATION_REGULAR,
> member_list=0xbfbdbbb4, member_list_entries=1, left_list=0x0,
> left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1,
> ring_id=0xb72e0664) at main.c:327
> #6 0xb7fabf92 in app_confchg_fn
> (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4,
> member_list_entries=1, left_list=0x0,
> left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1,
> ring_id=0xb72e0664) at totempg.c:348
> #7 0xb7fabea2 in totempg_confchg_fn
> (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4,
> member_list_entries=1, left_list=0x0,
> left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1,
> ring_id=0xb72e0664) at totempg.c:522
> #8 0xb7fab970 in totemmrp_confchg_fn
> (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4,
> member_list_entries=1, left_list=0x0,
> left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1,
> ring_id=0xb72e0664) at totemmrp.c:109
> #9 0xb7fa452a in memb_state_operational_enter (instance=0xb72bf008) at
> totemsrp.c:1696
> #10 0xb7fa8e26 in message_handler_orf_token (instance=0xb72bf008,
> msg=0x9a1df04, msg_len=70, endian_conversion_needed=0) at totemsrp.c:3444
> #11 0xb7fab718 in main_deliver_fn (context=0xb72bf008, msg=0x9a1df04,
> msg_len=70) at totemsrp.c:4168
> #12 0xb7f9f696 in none_token_recv (rrp_instance=0x9a1a4b8, iface_no=0,
> context=0xb72bf008, msg=0x9a1df04, msg_len=70, token_seq=3) at totemrrp.c:533
> #13 0xb7fa10de in rrp_deliver_fn (context=0x9a1d8a8, msg=0x9a1df04,
> msg_len=70) at totemrrp.c:1390
> #14 0xb7f9d64f in net_deliver_fn (handle=7749363892505018368, fd=10,
> revents=1, data=0x9a1d8c8) at totemudp.c:1221
> #15 0xb7f9a5d4 in poll_run (handle=7749363892505018368) at coropoll.c:393
> #16 0x0804d546 in main (argc=2, argv=0xbfbe30e4) at main.c:1050
> (gdb)
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais