Hi Ryan/Steven, As of corosync-1.2.8/openais-1.1.4, there seems to be a race in the lck.c cleanup code. I am simply trying to run openais/test/testlck against a 1-node cluster, and upon testlck exit corosync segfaults as shown below. It appears that by the time this code is reached, req_exec_lck_resourceclose->source.conn is already deallocated/released and contains garbage.
This was not happening in corosync-1.1.2/openais-1.1.0. Looks like there was a patch last year around this area: http://marc.info/?l=openais&m=124707755231826&w=2, not sure if it triggered this behavior. Commenting out the cleanup code in message_handler_req_exec_lck_resourceclose solves the issue, but of course will cause resource leak. Could you please give me some pointers as to how to debug this further? Also, I've noticed that a patch recommended for FreeBSD (http://marc.info/?l=openais&m=128922243926782&w=2) should be definitely used for Linux as the client trips on this assert from time to time (albeit considerably less frequently than the above issue, which happens 90% of the time). Thanks in advance, KM -------------------------------------------------8<--------------------- --------------------------- Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f5962627700 (LWP 3046)] lck_resource_cleanup_find (message=0x7fff74145aa0, nodeid=<value optimized out>) at lck.c:1603 1603 if (mar_name_match (resource_name, &cleanup->resource_name)) { (gdb) bt #0 lck_resource_cleanup_find (message=0x7fff74145aa0, nodeid=<value optimized out>) at lck.c:1603 #1 message_handler_req_exec_lck_resourceclose (message=0x7fff74145aa0, nodeid=<value optimized out>) at lck.c:2309 #2 0x00000000004073a0 in deliver_fn (nodeid=1, msg=0x7fff74145aa0, msg_len=<value optimized out>, endian_conversion_required=0) at main.c:771 #3 0x00007f5962a555ef in app_deliver_fn (nodeid=1, msg=<value optimized out>, msg_len=<value optimized out>, endian_conversion_required=0) at totempg.c:506 #4 0x00007f5962a55b73 in totempg_deliver_fn (nodeid=1, msg=0x1f43a12, msg_len=0, endian_conversion_required=0) at totempg.c:618 #5 0x00007f5962a4d94f in messages_deliver_to_app (instance=0x7f59607a4010, skip=0, end_point=<value optimized out>) at totemsrp.c:3701 #6 0x00007f5962a53954 in message_handler_orf_token (instance=<value optimized out>, msg=<value optimized out>, msg_len=<value optimized out>, endian_conversion_needed=<value optimized out>) at totemsrp.c:3575 #7 0x00007f5962a49b83 in rrp_deliver_fn (context=0x1efe070, msg=0x1f2347c, msg_len=71) at totemrrp.c:1393 #8 0x00007f5962a48a76 in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=0x1f22db0) at totemudp.c:1244 #9 0x00007f5962a447f2 in poll_run (handle=6344401509261770752) at coropoll.c:510 #10 0x0000000000406add in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at main.c:1680 (gdb) print resource_name $1 = (const mar_name_t *) 0x7fff74145ac0 (gdb) print *resource_name $2 = {length = 19, value = "test_resource_async", '\000' <repeats 236 times>} (gdb) list 1598 cleanup_list != &lck_pd->resource_cleanup_list; 1599 cleanup_list = cleanup_list->next) 1600 { 1601 cleanup = list_entry (cleanup_list, struct resource_cleanup, cleanup_list); 1602 1603 if (mar_name_match (resource_name, &cleanup->resource_name)) { 1604 return (cleanup); 1605 } 1606 } 1607 return (0); (gdb) print cleanup_list $4 = (struct list_head *) 0x6f74206465646461 (gdb) print lck_pd $5 = (struct lck_pd *) 0x1f3dfb0 (gdb) print *lck_pd $6 = {resource_list = {next = 0x206465747361636d, prev = 0x206567617373656d}, resource_cleanup_list = {next = 0x6f74206465646461, prev = 0x676e69646e657020}} _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais