In this backtrace pacemaker is asserting.

I suggest mailing the pacemaker list with your question.

I also suggest using corosync 1.0.0 release, since trunk is under
development.

Regards
-steve

On Thu, 2009-07-23 at 18:28 -0600, Jonathan wrote:
> Hello all:
> 
> I am trying to get a corosync-openais-pacemaker cluster stack installed
> and running on a 2 node cluster.
> 
> I am currently using the latest corosync and openais from the svn
> repository (as of July 22)
> The version of pacemaker is the 1.0 tip from Today (July 23rd)
> 
> Corosync has been crashing randomly on one of my nodes (The same one,
> consistently).After a crash, I shut down the cluster to upgrade
> corosync hoping to resolve the problem.  Now pacemaker is causing a
> crash on startup, and I cannot start either node.
> 
> I have attached the debug output from one node in the cluster, and a 
> backtrace from the crash in GDB.
> 
> Is there something I can do to reset the cluster state so it will start? Or 
> is this a bug?
> 
> Thanks!
> 
> Jonathan
> 
> plain text document attachment (crash.log)
> Jul 22 21:18:22 corosync [MAIN  ] main.c:717 The Platform is missing process 
> priority setting features.  Leaving at default.
> Jul 22 21:18:22 corosync [MAIN  ] main.c:786 Corosync Cluster Engine 
> ('trunk'): started and ready to provide service.
> Jul 22 21:18:22 corosync [MAIN  ] main.c:867 Successfully configured openais 
> services to load
> Jul 22 21:18:22 corosync [MAIN  ] main.c:867 Successfully read main 
> configuration file '/usr/local/etc/corosync/corosync.conf'.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:780 Token Timeout (10000 ms) 
> retransmit timeout (495 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:783 token hold (386 ms) 
> retransmits before loss (20 retrans)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:790 join (60 ms) send_join (0 
> ms) consensus (4800 ms) merge (200 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:793 downcheck (1000 ms) fail to 
> recv const (50 msgs)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:795 seqno unchanged const (30 
> rotations) Maximum network MTU 1500
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:799 window size per rotation (50 
> messages) maximum messages per rotation (20 messages)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:802 send threads (0 threads)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:805 RRP token expired timeout 
> (495 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:808 RRP token problem counter 
> (2000 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:811 RRP threshold (10 problem 
> count)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:813 RRP mode set to none.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:816 heartbeat_failures_allowed 
> (0)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:818 max_network_delay (50 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:841 HeartBeat is Disabled. To 
> enable set heartbeat_failures_allowed > 0
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:319 Initializing 
> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais 
> cluster membership service B.01.01'
> Jul 22 21:18:22 corosync [EVT   ] evt.c:3107 Evt exec init request
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais 
> event service B.01.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais 
> checkpoint service B.01.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais 
> availability management framework B.01.01'
> Jul 22 21:18:22 corosync [MSG   ] msg.c:2404 [DEBUG]: msg_exec_init_fn
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais 
> message service B.03.01'
> Jul 22 21:18:22 corosync [LCK   ] lck.c:1472 [DEBUG]: lck_exec_init_fn
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais 
> distributed locking service B.03.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais 
> timer service A.01.01'
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:313 info: process_ais_conf: 
> Reading configure
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:547 info: config_find_init: Local 
> handle: 5912924842687987714 for logging
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional logging options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:599 info: get_config_opt: Found 
> 'on' for option: debug
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:613 info: get_config_opt: 
> Defaulting to 'off' for option: to_file
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:599 info: get_config_opt: Found 
> 'daemon' for option: syslog_facility
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:547 info: config_find_init: Local 
> handle: 3984067077437652995 for service
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: 
> Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:613 info: get_config_opt: 
> Defaulting to 'no' for option: use_logd
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:613 info: get_config_opt: 
> Defaulting to 'no' for option: use_mgmtd
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:404 info: pcmk_plugin_init: CRM: 
> Initialized
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:405 Logging: Initialized 
> pcmk_plugin_init
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:418 info: pcmk_plugin_init: 
> Service: 9
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:419 info: pcmk_plugin_init: Local 
> node id: 0
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:420 info: pcmk_plugin_init: Local 
> hostname: Aries
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:234 info: update_member: Creating 
> entry for node 0 born on 0
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:261 info: update_member: 0x8eb92e8 
> Node 0 now known as Aries (was: (null))
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:277 info: update_member: Node Aries 
> now has 1 quorum votes (was 0)
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:287 info: update_member: Node 
> 0/Aries is now: member
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 
> 9370 for process stonithd
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 
> 9371 for process cib
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 
> 9372 for process lrmd
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 
> 9373 for process attrd
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 
> 9374 for process pengine
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 
> 9375 for process crmd
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:540 info: pcmk_startup: CRM: 
> Initialized
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 
> 'Pacemaker Cluster Manager'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync 
> extended virtual synchrony service'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync 
> configuration service'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync 
> cluster closed process group service v1.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync 
> cluster config database access v1.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync 
> profile loading service'
> Jul 22 21:18:22 corosync [MAIN  ] main.c:1010 Compatibility mode set to 
> whitetank.  Using V1 and V2 of the synchronization engine.
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1531 Receive multicast socket 
> recv buffer size (262142 bytes).
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1537 Transmit multicast socket 
> send buffer size (262142 bytes).
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1355 The network interface 
> [172.29.1.1] is now up.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4193 Created or loaded sequence 
> id 740.172.29.1.1 for this ring.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1786 entering GATHER state from 
> 15.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:2768 Creating commit token 
> because I am the rep.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1337 Saving state aru 0 high seq 
> received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3004 Storing new sequence id for 
> ring 2e8
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1825 entering COMMIT state.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4060 got commit token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1857 entering RECOVERY state.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1886 position [0] member 
> 172.29.1.1:
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1890 previous ring seq 740 rep 
> 172.29.1.1
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1896 aru 0 high delivered 0 
> received flag 1
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:2003 Did not need to originate 
> any messages in recovery.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4060 got commit token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4114 Sending initial ORF token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my 
> set retrans flag0 retrans queue empty 1 count 0, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high 
> seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my 
> set retrans flag0 retrans queue empty 1 count 1, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high 
> seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my 
> set retrans flag0 retrans queue empty 1 count 2, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high 
> seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my 
> set retrans flag0 retrans queue empty 1 count 3, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high 
> seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3442 retrans flag count 4 token 
> aru 0 install seq 0 aru 0 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1558 recovery to regular 1-0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1643 Delivering to app 1 to 0
> Jul 22 21:18:22 corosync [CLM   ] clm.c:564 CLM CONFIGURATION CHANGE
> Jul 22 21:18:22 corosync [CLM   ] clm.c:565 New Configuration:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:569 Members Left:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:574 Members Joined:
> Jul 22 21:18:22 corosync [EVT   ] evt.c:2918 Evt conf change 1
> Jul 22 21:18:22 corosync [EVT   ] evt.c:2922 m 0, j 0 l 0
> Jul 22 21:18:22 corosync [LCK   ] lck.c:841 [DEBUG]: lck_confchg_fn
> Jul 22 21:18:22 corosync [MSG   ] msg.c:1085 [DEBUG]: msg_confchg_fn
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:633 notice: pcmk_peer_update: 
> Transitional membership event on ring 744: memb=0, new=0, lost=0
> Jul 22 21:18:22 corosync [CLM   ] clm.c:564 CLM CONFIGURATION CHANGE
> Jul 22 21:18:22 corosync [CLM   ] clm.c:565 New Configuration:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:567     no interface found for nodeid
> Jul 22 21:18:22 corosync [CLM   ] clm.c:569 Members Left:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:574 Members Joined:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:576     no interface found for nodeid
> Jul 22 21:18:22 corosync [EVT   ] evt.c:2918 Evt conf change 0
> Jul 22 21:18:22 corosync [EVT   ] evt.c:2922 m 1, j 1 l 0
> Jul 22 21:18:22 corosync [LCK   ] lck.c:841 [DEBUG]: lck_confchg_fn
> Jul 22 21:18:22 corosync [MSG   ] msg.c:1085 [DEBUG]: msg_confchg_fn
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:633 notice: pcmk_peer_update: 
> Stable membership event on ring 744: memb=1, new=1, lost=0
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:234 info: update_member: Creating 
> entry for node 16850348 born on 744
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:287 info: update_member: Node 
> 16850348/unknown is now: member
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:661 info: pcmk_peer_update: NEW:  
> .pending. 16850348
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:667 debug: pcmk_peer_update: Node 
> 16850348 has address no interface found for nodeid
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:679 info: pcmk_peer_update: MEMB: 
> .pending. 16850348
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:596 info: 
> ais_mark_unseen_peer_dead: Node Aries was not seen in the previous transition
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:287 info: update_member: Node 
> 0/Aries is now: lost
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:712 debug: pcmk_peer_update: 2 
> nodes changed
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:1187 info: 
> send_member_notification: Sending membership update 744 to 0 children
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:1439 CRIT: send_cluster_id: 
> Assertion failure line 1439: local_nodeid != 0
> /usr/local/sbin/aisexec: line 3:  9365 Aborted                 (core dumped) 
> corosync "$@"
> 
> plain text document attachment (debug.session)
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:679 info: pcmk_peer_update: MEMB: 
> .pending. 16850348
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:596 info: 
> ais_mark_unseen_peer_dead: Node Aries was not seen in the previous transition
> Jul 23 17:18:41 corosync [pcmk  ] utils.c:287 info: update_member: Node 
> 0/Aries is now: lost
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:712 debug: pcmk_peer_update: 2 
> nodes changed
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:1187 info: 
> send_member_notification: Sending membership update 764 to 0 children
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:1439 CRIT: send_cluster_id: 
> Assertion failure line 1439: local_nodeid != 0
> 
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0xb7c726c0 (LWP 10526)]
> 0xb7fc8424 in __kernel_vsyscall ()
> (gdb) bt
> #0  0xb7fc8424 in __kernel_vsyscall ()
> #1  0xb7e4fb91 in raise () from /lib/libc.so.6
> #2  0xb7e51378 in abort () from /lib/libc.so.6
> #3  0xb722f2d5 in send_cluster_id () from 
> /usr/local/libexec/lcrso/pacemaker.lcrso
> #4  0xb722b362 in pcmk_peer_update () from 
> /usr/local/libexec/lcrso/pacemaker.lcrso
> #5  0x0804be0e in confchg_fn (configuration_type=TOTEM_CONFIGURATION_REGULAR, 
> member_list=0xbfbdbbb4, member_list_entries=1, left_list=0x0, 
>     left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1, 
> ring_id=0xb72e0664) at main.c:327
> #6  0xb7fabf92 in app_confchg_fn 
> (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4, 
> member_list_entries=1, left_list=0x0, 
>     left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1, 
> ring_id=0xb72e0664) at totempg.c:348
> #7  0xb7fabea2 in totempg_confchg_fn 
> (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4, 
> member_list_entries=1, left_list=0x0, 
>     left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1, 
> ring_id=0xb72e0664) at totempg.c:522
> #8  0xb7fab970 in totemmrp_confchg_fn 
> (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4, 
> member_list_entries=1, left_list=0x0, 
>     left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1, 
> ring_id=0xb72e0664) at totemmrp.c:109
> #9  0xb7fa452a in memb_state_operational_enter (instance=0xb72bf008) at 
> totemsrp.c:1696
> #10 0xb7fa8e26 in message_handler_orf_token (instance=0xb72bf008, 
> msg=0x9a1df04, msg_len=70, endian_conversion_needed=0) at totemsrp.c:3444
> #11 0xb7fab718 in main_deliver_fn (context=0xb72bf008, msg=0x9a1df04, 
> msg_len=70) at totemsrp.c:4168
> #12 0xb7f9f696 in none_token_recv (rrp_instance=0x9a1a4b8, iface_no=0, 
> context=0xb72bf008, msg=0x9a1df04, msg_len=70, token_seq=3) at totemrrp.c:533
> #13 0xb7fa10de in rrp_deliver_fn (context=0x9a1d8a8, msg=0x9a1df04, 
> msg_len=70) at totemrrp.c:1390
> #14 0xb7f9d64f in net_deliver_fn (handle=7749363892505018368, fd=10, 
> revents=1, data=0x9a1d8c8) at totemudp.c:1221
> #15 0xb7f9a5d4 in poll_run (handle=7749363892505018368) at coropoll.c:393
> #16 0x0804d546 in main (argc=2, argv=0xbfbe30e4) at main.c:1050
> (gdb) 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to