Chrissie,
Thanks a bunch for the trace... I was unable to reproduce the issue
after 50ish runs, but the following patch should fix it. Could you give
it a spin?
The issue is the signal to the cond variable can be missed at startup if
the thread is immediately executed instead of scheduled later (race
condition).
thanks
-steve
On Mon, 2008-11-03 at 14:35 +0000, Christine Caulfield wrote:
> Since logsys2 was committed I can easily make corosync crash with any
> corosync-objctl command. It crashes in a very unhelpful way ... with
> mention of a GDB bug. So it might be stack related I suppose. In
> particular 'corosync-objctl -a' or 'corosync-objctl -w quorum.quorate=1'
>
> Also sometimes I see startup hangs where corosync doesn't even get
> going. These look to be threads deadlocked on logsys condition variables:
>
> (gdb) thr a a bt
>
>
> Thread 3 (Thread 0xb7b39b90 (LWP 6929)):
> #0 0x00110416 in __kernel_vsyscall ()
> #1 0x009f6ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2 0x0806e3e9 in wthread_wait () at logsys.c:434
> #3 0x0806e27a in logsys_worker_thread (data=0x0) at logsys.c:452
> #4 0x009f332f in start_thread () from /lib/libpthread.so.0
> #5 0x0092e20e in clone () from /lib/libc.so.6
>
> Thread 2 (Thread 0xb7f36230 (LWP 6928)):
> #0 0x00110416 in __kernel_vsyscall ()
> #1 0x00923a57 in poll () from /lib/libc.so.6
> #2 0x080532be in prioritized_timer_thread (data=0x0) at timer.c:123
> #3 0x009f332f in start_thread () from /lib/libpthread.so.0
> #4 0x0092e20e in clone () from /lib/libc.so.6
>
> Thread 1 (Thread 0xb7f0b6c0 (LWP 6925)):
> #0 0x00110416 in __kernel_vsyscall ()
> #1 0x009f6ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2 0x0806e3e9 in wthread_wait () at logsys.c:434
> #3 0x0806e47f in wthread_create () at logsys.c:511
> ---Type <return> to continue, or q <return> to quit---
> #4 0x0806e554 in _logsys_wthread_create () at logsys.c:540
> #5 0x0806ef8e in logsys_fork_completed () at logsys.c:790
> #6 0x0804dd89 in main (argc=2, argv=0xbfa36124) at main.c:644
> (gdb)
>
>
> Neither of these happened to me before the logsys2 commit.
>
> Chrissie
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
Index: exec/logsys.c
===================================================================
--- exec/logsys.c (revision 1685)
+++ exec/logsys.c (working copy)
@@ -435,6 +435,12 @@
pthread_mutex_unlock (&logsys_cond_mutex);
}
+static inline void wthread_wait_locked (void)
+{
+ pthread_cond_wait (&logsys_cond, &logsys_cond_mutex);
+ pthread_mutex_unlock (&logsys_cond_mutex);
+}
+
static void *logsys_worker_thread (void *data)
{
int log_msg;
@@ -502,13 +508,15 @@
pthread_mutex_init (&logsys_cond_mutex, NULL);
pthread_cond_init (&logsys_cond, NULL);
+ pthread_mutex_lock (&logsys_cond_mutex);
res = pthread_create (&logsys_thread_id, NULL,
logsys_worker_thread, NULL);
+
/*
* Wait for thread to be started
*/
- wthread_wait ();
+ wthread_wait_locked ();
}
/*
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais