On Dec 1, 2008, at 12:10 PM, Andrew Beekhof wrote:

> This occurred while the (artificially induced) split cluster was  
> reforming:
>
> Dec  1 11:32:58 c001n16 CTS: debug: Created 3 partitions
> Dec  1 11:32:58 c001n16 CTS: debug: Partition[2]:     ['c001n11']
> Dec  1 11:32:58 c001n16 CTS: debug: Partition[3]:     ['c001n12']
> Dec  1 11:32:58 c001n16 CTS: debug: Partition[4]:     ['c001n09',  
> 'c001n10']
>
> The aisexec process on c001n12 crashed.

Actually, I just realized that c001n11 also crashed in the same place  
and with the same value of group_len[0].

>
>
> #0  0x0805dd46 in group_matches (iovec=0xbf9d670c, iov_len=1,  
> groups_b=0x819ebe0, group_b_cnt=1, adjust_iovec=0xbf9d6714) at  
> totempg.c:364
> #1  0x0805daee in app_deliver_fn (nodeid=163, iovec=0xbf9d670c,  
> iov_len=1, endian_conversion_required=0) at totempg.c:414
> #2  0x0805d8a8 in totempg_deliver_fn (nodeid=163, iovec=0x818b3b8,  
> iov_len=1, endian_conversion_required=0) at totempg.c:591
> #3  0x0805cc23 in totemmrp_deliver_fn (nodeid=163, iovec=0x818b3b8,  
> iov_len=1, endian_conversion_required=0) at totemmrp.c:82
> #4  0x0805a72a in messages_deliver_to_app (instance=0xb74fa008,  
> skip=0, end_point=26) at totemsrp.c:3558
> #5  0x0805ab75 in message_handler_mcast (instance=0xb74fa008,  
> msg=0x8191a7c, msg_len=1372, endian_conversion_needed=0) at  
> totemsrp.c:3689
> #6  0x0805ca6c in main_deliver_fn (context=0xb74fa008,  
> msg=0x8191a7c, msg_len=1372) at totemsrp.c:4132
> #7  0x08050db2 in none_mcast_recv (rrp_instance=0x8190fc8,  
> iface_no=0, context=0xb74fa008, msg=0x8191a7c, msg_len=1372) at  
> totemrrp.c:476
> #8  0x08052708 in rrp_deliver_fn (context=0x8191430, msg=0x8191a7c,  
> msg_len=1372) at totemrrp.c:1319
> #9  0x0804ee4f in net_deliver_fn (handle=0, fd=1, revents=1,  
> data=0x8191450) at totemnet.c:676
> #10 0x0804d376 in poll_run (handle=0) at aispoll.c:382
> #11 0x08064139 in main (argc=1, argv=0xbf9d9104) at main.c:642
>
> (gdb) print i
> $1 = 8960
> (gdb) print group_len[0]
> $2 = 19595
>
> Logs for the process attached (there's lots of recovery going on and  
> reference to a bad message).
>
> Each node uses the last octet of the node's ip addr as its nodeid  
> (configured in openais.conf)
> c001n09.suse.de has address 10.10.222.163
> c001n10.suse.de has address 10.10.222.164
> c001n11.suse.de has address 10.10.222.165
> c001n12.suse.de has address 10.10.222.166
>
> Logs from the other nodes are available if needed.
>
> <splitbrain.logs>

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to