Hello,

I have found the cause of the crash that was occurring only on some deployments. The cause is that sem_wait is interrupted by signal, and the wait operation is not retried (as is customary in posix).

Patch attached to fix

A big thank you to Vladislav Bogdanov for running the test case and verifying it fixes the problem.


Regards
-steve
Index: logsys.c
===================================================================
--- logsys.c    (revision 2915)
+++ logsys.c    (working copy)
@@ -661,7 +661,18 @@
        sem_post (&logsys_thread_start);
        for (;;) {
                dropped = 0;
-               sem_wait (&logsys_print_finished);
+retry_sem_wait:
+               res = sem_wait (&logsys_print_finished);
+               if (res == -1 && errno == EINTR) {
+                       goto retry_sem_wait;
+               } else
+               if (res == -1) {
+                       /*
+ *                      * This case shouldn't happen
+ *                      */
+                       pthread_exit (NULL);
+               }
+               
 
                logsys_wthread_lock();
                if (wthread_should_exit) {
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to