On Dec 1, 2008, at 12:10 PM, Andrew Beekhof wrote: > This occurred while the (artificially induced) split cluster was > reforming: > > Dec 1 11:32:58 c001n16 CTS: debug: Created 3 partitions > Dec 1 11:32:58 c001n16 CTS: debug: Partition[2]: ['c001n11'] > Dec 1 11:32:58 c001n16 CTS: debug: Partition[3]: ['c001n12'] > Dec 1 11:32:58 c001n16 CTS: debug: Partition[4]: ['c001n09', > 'c001n10'] > > The aisexec process on c001n12 crashed.
Actually, I just realized that c001n11 also crashed in the same place and with the same value of group_len[0]. > > > #0 0x0805dd46 in group_matches (iovec=0xbf9d670c, iov_len=1, > groups_b=0x819ebe0, group_b_cnt=1, adjust_iovec=0xbf9d6714) at > totempg.c:364 > #1 0x0805daee in app_deliver_fn (nodeid=163, iovec=0xbf9d670c, > iov_len=1, endian_conversion_required=0) at totempg.c:414 > #2 0x0805d8a8 in totempg_deliver_fn (nodeid=163, iovec=0x818b3b8, > iov_len=1, endian_conversion_required=0) at totempg.c:591 > #3 0x0805cc23 in totemmrp_deliver_fn (nodeid=163, iovec=0x818b3b8, > iov_len=1, endian_conversion_required=0) at totemmrp.c:82 > #4 0x0805a72a in messages_deliver_to_app (instance=0xb74fa008, > skip=0, end_point=26) at totemsrp.c:3558 > #5 0x0805ab75 in message_handler_mcast (instance=0xb74fa008, > msg=0x8191a7c, msg_len=1372, endian_conversion_needed=0) at > totemsrp.c:3689 > #6 0x0805ca6c in main_deliver_fn (context=0xb74fa008, > msg=0x8191a7c, msg_len=1372) at totemsrp.c:4132 > #7 0x08050db2 in none_mcast_recv (rrp_instance=0x8190fc8, > iface_no=0, context=0xb74fa008, msg=0x8191a7c, msg_len=1372) at > totemrrp.c:476 > #8 0x08052708 in rrp_deliver_fn (context=0x8191430, msg=0x8191a7c, > msg_len=1372) at totemrrp.c:1319 > #9 0x0804ee4f in net_deliver_fn (handle=0, fd=1, revents=1, > data=0x8191450) at totemnet.c:676 > #10 0x0804d376 in poll_run (handle=0) at aispoll.c:382 > #11 0x08064139 in main (argc=1, argv=0xbf9d9104) at main.c:642 > > (gdb) print i > $1 = 8960 > (gdb) print group_len[0] > $2 = 19595 > > Logs for the process attached (there's lots of recovery going on and > reference to a bad message). > > Each node uses the last octet of the node's ip addr as its nodeid > (configured in openais.conf) > c001n09.suse.de has address 10.10.222.163 > c001n10.suse.de has address 10.10.222.164 > c001n11.suse.de has address 10.10.222.165 > c001n12.suse.de has address 10.10.222.166 > > Logs from the other nodes are available if needed. > > <splitbrain.logs> _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
