Hi All,

I came across a perhaps slightly obscure segfault in corosync 1.3.0,
as follows:

1) Configure two rings (rrp_mode: passive, if that matters).
2) Start corosync.
3) Drop traffic to one of the rings - in my case, ring 0, via
   "iptables -A OUTPUT -d 192.168.4.186 -j DROP".
4) Run "corosync-cfgtool -r" and immediately hit CTRL-C.  You might
   have to do this a couple of times (or put it in a while loop in
   the shell and hit CTRL-C at random), but if you do terminate
   corosync-cfgtool, corosync itself will immediately segfault with
   the following backtrace:

(gdb) bt
#0  memcpy () at ../sysdeps/i386/i686/memcpy.S:100
#1  0xb774ded3 in coroipcs_response_send (conn=0x80f6f88,
    msg=0xbf7f7c90, mlen=24) at /usr/include/bits/string3.h:52
#2  0xada9dab4 in message_handler_req_exec_cfg_ringreenable
    (message=0xbf7f7d20, nodeid=956606656) at cfg.c:583
#3  0x0804de75 in deliver_fn (nodeid=956606656, msg=0xbf7f7d20,
    msg_len=32, endian_conversion_required=0) at main.c:852
#4  0xb77793c0 in app_deliver_fn (nodeid=956606656, msg=<value
    optimized out>, msg_len=37, endian_conversion_required=0)
    at totempg.c:506
#5  0xb777991e in totempg_deliver_fn (nodeid=956606656,
    msg=0x80f6fea, msg_len=37, endian_conversion_required=0)
    at totempg.c:618
#6  0xb7777a25 in totemmrp_deliver_fn (nodeid=956606656,
    msg=0x80f6fea, msg_len=47, endian_conversion_required=0) at
    totemmrp.c:98
#7  0xb776ed81 in messages_deliver_to_app (instance=0xb5bb4008,
    skip=0, end_point=40) at totemsrp.c:3704
#8  0xb77752a9 in message_handler_orf_token (instance=0xb5bb4008,
    msg=0x80c9dbc, msg_len=71, endian_conversion_needed=0) at
    totemsrp.c:3577
#9  0xb776d3c9 in main_deliver_fn (context=0xb5bb4008, msg=0x80c9dbc,
    msg_len=71) at totemsrp.c:4356
#10 0xb776b27b in passive_token_recv (rrp_instance=0x8082be8,
    iface_no=1, context=0xb5bb4008, msg=0x80c9dbc, msg_len=71, 
    token_seq=539) at totemrrp.c:876
#11 0xb776a4a3 in rrp_deliver_fn (context=0x8089358, msg=0x80c9dbc,
    msg_len=71) at totemrrp.c:1500
#12 0xb77651d2 in net_deliver_fn (handle=5880381755227111424, fd=13,
    revents=1, data=0x80c9758) at totemudp.c:1244
#13 0xb7761100 in poll_run (handle=5880381755227111424) at
    coropoll.c:510
#14 0x0804f508 in main (argc=Cannot access memory at address 0x6)
    at main.c:1813

The relevant code is:

int coroipcs_response_send (void *conn, const void *msg, size_t mlen)
{
    struct conn_info *conn_info = (struct conn_info *)conn;

    memcpy (conn_info->response_buffer, msg, mlen);

    ipc_sem_post (conn_info->control_buffer, SEMAPHORE_RESPONSE);

    api->stats_increment_value (conn_info->stats_handle, "responses");
    return (0);
}

At this point, 'conn' points to junk, presumably because
corosync-cfgtool was terminated after sending the ring re-enable
message but before receiving a response, so there's no "other end" for
the IPC.

Given this is a relatively obscure case, my question is: is this
fixable (should I open a bug), or is it more a case of -EDONTDOTHAT?

Thanks,

Tim


-- 
Tim Serong <[email protected]>
Senior Clustering Engineer, OPS Engineering, Novell Inc.



_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to