Hi All,
I came across a perhaps slightly obscure segfault in corosync 1.3.0,
as follows:
1) Configure two rings (rrp_mode: passive, if that matters).
2) Start corosync.
3) Drop traffic to one of the rings - in my case, ring 0, via
"iptables -A OUTPUT -d 192.168.4.186 -j DROP".
4) Run "corosync-cfgtool -r" and immediately hit CTRL-C. You might
have to do this a couple of times (or put it in a while loop in
the shell and hit CTRL-C at random), but if you do terminate
corosync-cfgtool, corosync itself will immediately segfault with
the following backtrace:
(gdb) bt
#0 memcpy () at ../sysdeps/i386/i686/memcpy.S:100
#1 0xb774ded3 in coroipcs_response_send (conn=0x80f6f88,
msg=0xbf7f7c90, mlen=24) at /usr/include/bits/string3.h:52
#2 0xada9dab4 in message_handler_req_exec_cfg_ringreenable
(message=0xbf7f7d20, nodeid=956606656) at cfg.c:583
#3 0x0804de75 in deliver_fn (nodeid=956606656, msg=0xbf7f7d20,
msg_len=32, endian_conversion_required=0) at main.c:852
#4 0xb77793c0 in app_deliver_fn (nodeid=956606656, msg=<value
optimized out>, msg_len=37, endian_conversion_required=0)
at totempg.c:506
#5 0xb777991e in totempg_deliver_fn (nodeid=956606656,
msg=0x80f6fea, msg_len=37, endian_conversion_required=0)
at totempg.c:618
#6 0xb7777a25 in totemmrp_deliver_fn (nodeid=956606656,
msg=0x80f6fea, msg_len=47, endian_conversion_required=0) at
totemmrp.c:98
#7 0xb776ed81 in messages_deliver_to_app (instance=0xb5bb4008,
skip=0, end_point=40) at totemsrp.c:3704
#8 0xb77752a9 in message_handler_orf_token (instance=0xb5bb4008,
msg=0x80c9dbc, msg_len=71, endian_conversion_needed=0) at
totemsrp.c:3577
#9 0xb776d3c9 in main_deliver_fn (context=0xb5bb4008, msg=0x80c9dbc,
msg_len=71) at totemsrp.c:4356
#10 0xb776b27b in passive_token_recv (rrp_instance=0x8082be8,
iface_no=1, context=0xb5bb4008, msg=0x80c9dbc, msg_len=71,
token_seq=539) at totemrrp.c:876
#11 0xb776a4a3 in rrp_deliver_fn (context=0x8089358, msg=0x80c9dbc,
msg_len=71) at totemrrp.c:1500
#12 0xb77651d2 in net_deliver_fn (handle=5880381755227111424, fd=13,
revents=1, data=0x80c9758) at totemudp.c:1244
#13 0xb7761100 in poll_run (handle=5880381755227111424) at
coropoll.c:510
#14 0x0804f508 in main (argc=Cannot access memory at address 0x6)
at main.c:1813
The relevant code is:
int coroipcs_response_send (void *conn, const void *msg, size_t mlen)
{
struct conn_info *conn_info = (struct conn_info *)conn;
memcpy (conn_info->response_buffer, msg, mlen);
ipc_sem_post (conn_info->control_buffer, SEMAPHORE_RESPONSE);
api->stats_increment_value (conn_info->stats_handle, "responses");
return (0);
}
At this point, 'conn' points to junk, presumably because
corosync-cfgtool was terminated after sending the ring re-enable
message but before receiving a response, so there's no "other end" for
the IPC.
Given this is a relatively obscure case, my question is: is this
fixable (should I open a bug), or is it more a case of -EDONTDOTHAT?
Thanks,
Tim
--
Tim Serong <[email protected]>
Senior Clustering Engineer, OPS Engineering, Novell Inc.
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais