Hello,
I have found the cause of the crash that was occurring only on some
deployments. The cause is that sem_wait is interrupted by signal, and
the wait operation is not retried (as is customary in posix).
Patch attached to fix
A big thank you to Vladislav Bogdanov for running the test case and
verifying it fixes the problem.
Regards
-steve
Index: logsys.c
===================================================================
--- logsys.c (revision 2915)
+++ logsys.c (working copy)
@@ -661,7 +661,18 @@
sem_post (&logsys_thread_start);
for (;;) {
dropped = 0;
- sem_wait (&logsys_print_finished);
+retry_sem_wait:
+ res = sem_wait (&logsys_print_finished);
+ if (res == -1 && errno == EINTR) {
+ goto retry_sem_wait;
+ } else
+ if (res == -1) {
+ /*
+ * * This case shouldn't happen
+ * */
+ pthread_exit (NULL);
+ }
+
logsys_wthread_lock();
if (wthread_should_exit) {
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker