Question about magic number "2" in the coroipcc.c code. Is the logic to try sem_timedwait, then retry after two seconds if no response has been executed from the server? Why two seconds? Is this to return an error code to the library when the semaphore is deleted in the server?
If so, two seconds sounds ok, but please make it a define in the C file rather then magic number. The patch contains two issues. Please commit them as separate patches with separate descriptions. Good catch on the malloc in signal handler. glibc should be able to handle that, but it appears that is not the case here. Good investigative work Regards -steve On Fri, 2010-01-08 at 12:22 +0100, Jan Friesse wrote: > Related to https://bugzilla.redhat.com/show_bug.cgi?id=547511 > > This patch solves problem in little different (I hope better) way. > > It fixes problem with sem_destroy + sem_wait and also solves hard freeze > because malloc(*) + other functions are called in sighandler. This is > reason, why special thread is created and only purpose in life of these > thread is to wait for semaphore and begin shutdown sequence. > > According to Fabbio, there are still some segfaults left on Fedora 12. > > Regards, > Honza > > (*) according to glibc documentation, malloc and free can be called in > signal handler, but in such case, I really don't understand this: > (gdb) bt > #0 0x00de1424 in __kernel_vsyscall () > #1 0x002c7e43 in __lll_lock_wait_private () from /lib/libc.so.6 > #2 0x00250b94 in _L_lock_9571 () from /lib/libc.so.6 > #3 0x0024ebf4 in malloc () from /lib/libc.so.6 > #4 0x08054d26 in hdb_handle_create (handle_database=0x805d748, > instance_size=12, handle_id_out=0x805fa48) at ../include/corosync/hdb.h:178 > #5 0x08055422 in schedwrk_create (handle=0x805fa48, > schedwrk_fn=0x8050bee <unlink_all_schedwrk_handler>, context=0x805d5a0) > at schedwrk.c:104 > #6 0x08050d33 in corosync_service_unlink_all (api=0x805d5a0, > unlink_all_complete=0x804b3fb <unlink_all_completed>) at service.c:583 > #7 0x0804b491 in corosync_shutdown_request () at main.c:171 > #8 0x0804b508 in sigintr_handler (num=2) at main.c:195 > #9 <signal handler called> > #10 0x0024ca2b in _int_malloc () from /lib/libc.so.6 > #11 0x0024ebfe in malloc () from /lib/libc.so.6 > #12 0x0023a7df in __fopen_internal () from /lib/libc.so.6 > #13 0x0023a8ac in fopen@@GLIBC_2.1 () from /lib/libc.so.6 > #14 0x0053352a in pid_to_name (pid=18300, out_name=0xbfda2016 , > name_len=32) at coroipcs.c:1515 > #15 0x00533654 in coroipcs_init_conn_stats (conn=0x82d1bb8) at > coroipcs.c:1557 > #16 0x00533a29 in coroipcs_handler_dispatch (fd=10, revent=1, > context=0x82d1bb8) at coroipcs.c:1670 > #17 0x0804d498 in corosync_poll_handler_dispatch > (handle=1197105576937521152, fd=10, revent=1, context=0x82d1bb8 at > main.c:911 > #18 0x0057b01b in poll_run (handle=1197105576937521152) at coropoll.c:394 > #19 0x0804ec8e in main (argc=1, argv=0xbfda3434) at main.c:1498 > > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
