On Mon, Jun 28, 2010 at 2:20 PM, Keisuke MORI <[email protected]> wrote:
> I've upgrade to pacemaker-1.0.9.1 / corosync-1.2.5 from clusterlabs on
> CentOS 5.5 using yum but it still hangs on its startup somtimes.
>
> The symptom is exactly same as this:
>  https://lists.linux-foundation.org/pipermail/openais/2010-June/014854.html

Arrgghhh!!!

Can you try the following patch?

Index: exec/main.c
===================================================================
--- exec/main.c.orig    2010-06-21 18:59:32.000000000 +0200
+++ exec/main.c 2010-06-29 08:29:48.834736539 +0200
@@ -425,20 +425,9 @@ static void corosync_tty_detach (void)
        /*
         * Map stdin/out/err to /dev/null.
         */
-       fd = open("/dev/null", O_RDWR);
-       if (fd >= 0) {
-               /* dup2 to 0 / 1 / 2 (stdin / stdout / stderr) */
-               close (STDIN_FILENO);
-               close (STDOUT_FILENO);
-               close (STDERR_FILENO);
-               dup2(fd, STDIN_FILENO);  /* 0 */
-               dup2(fd, STDOUT_FILENO); /* 1 */
-               dup2(fd, STDERR_FILENO); /* 2 */
-
-               /* Should be 0, but just in case it isn't... */
-               if (fd > 2)
-                       close(fd);
-       }
+       freopen("/dev/null", "r", stdin);
+       freopen("/dev/null", "a", stderr);
+       freopen("/dev/null", "a", stdout);
 }

 static void corosync_mlockall (void)



>
> Any hints what should I look into more?
> I have the core with me (taken by gcore) so if you want to look into
> it then I can show you.
>
> Here is the backtrace.
>
> ----8<--------8<--------8<--------8<--------8<--------8<--------8<----
> [r...@pm01 ~]# gdb /usr/sbin/corosync core.2596
> (...)
> (gdb) where
> #0  0x000000377a607b35 in pthread_join () from /lib64/libpthread.so.0
> #1  0x00002b12ea5528d9 in logsys_atexit () at logsys.c:1642
> #2  0x0000000000405a85 in sigsegv_handler (num=<value optimized out>)
> at main.c:222
> #3  <signal handler called>
> #4  0x0000003779a9a2fa in fork () from /lib64/libc.so.6
> #5  0x00002aaaaaba84de in spawn_child () from 
> /usr/libexec/lcrso/pacemaker.lcrso
> #6  0x00002aaaaabacb9b in pcmk_startup () from
> /usr/libexec/lcrso/pacemaker.lcrso
> #7  0x00000000004082c9 in corosync_service_link_and_init
> (corosync_api=0x613900, service_name=0x1f76c850 "pacemaker",
>    service_ver=0) at service.c:201
> #8  0x0000000000408673 in corosync_service_defaults_link_and_init
> (corosync_api=0x613900) at service.c:534
> #9  0x0000000000405086 in main_service_ready () at main.c:1224
> #10 0x00002b12ea332425 in main_iface_change_fn
> (context=0x2aaaaaaae010, iface_addr=<value optimized out>,
>    iface_no=<value optimized out>) at totemsrp.c:4363
> #11 0x00002b12ea3291a7 in timer_function_netif_check_timeout
> (data=0x1f793520) at totemudp.c:1380
> #12 0x00002b12ea326459 in timerlist_expire (handle=150346236434579456)
> at tlist.h:309
> #13 poll_run (handle=150346236434579456) at coropoll.c:448
> #14 0x0000000000406693 in main (argc=<value optimized out>,
> argv=<value optimized out>) at main.c:1576
> (gdb) q
> [r...@pm01 ~]# rpm -qa | grep corosync
> corosync-1.2.5-1.3.el5
> corosynclib-1.2.5-1.3.el5
> corosync-debuginfo-1.2.5-1.3.el5
> [r...@pm01 ~]# rpm -qa | grep pacemaker
> drbd-pacemaker-8.3.8-1
> pacemaker-libs-1.0.9.1-1.el5
> pacemaker-1.0.9.1-1.el5
> [r...@pm01 ~]# pgrep -lf corosync
> 2589 corosync
> 2596 corosync
> [r...@pm01 ~]# pgrep -lf heartbeat
> 2595 /usr/lib64/heartbeat/stonithd
> 2597 /usr/lib64/heartbeat/lrmd
> 2599 /usr/lib64/heartbeat/pengine
> 3473 /usr/lib64/heartbeat/crmd
>
> ----8<--------8<--------8<--------8<--------8<--------8<--------8<----
>
> Regards,
> --
> Keisuke MORI
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
>
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to