On 06/22/2010 11:07 AM, Vadym Chepkov wrote: > On Tue, Jun 22, 2010 at 1:49 PM, Steven Dake<[email protected]> wrote: >> On 06/22/2010 03:56 AM, Vadym Chepkov wrote: >>> >>> Hi, >>> >>> I decided to check if I can start using corosync again on several of >>> my clusters (have to use heartbeat there at the moment). >>> I don't even have any services defined in corosync.conf, commented >>> pacemaker out, just plain corosync and it never goes down: >>> >>> # ps axf|grep corosync >>> 26294 pts/0 S+ 0:00 | \_ /bin/sh /sbin/service >>> corosync restart >>> 26299 pts/0 S+ 0:01 | \_ /bin/bash >>> /etc/init.d/corosync restart >>> 29249 pts/1 S+ 0:00 \_ grep corosync >>> 25959 ? Ssl 0:00 corosync >>> >>> >>> I attached to the process and this is where it hangs: >>> >>> (gdb) where >>> #0 0x0fe14134 in poll () from /lib/libc.so.6 >>> #1 0x0ffbc530 in poll_run (handle=150346236434579456) at coropoll.c:413 >>> #2 0x10006e50 in main (argc=<value optimized out>, argv=<value >>> optimized out>) at main.c:1576 >>> >>> How can I help to debug this problem? >>> It is 100% reproducible. >>> >>> Thank you, >>> Vadym >>> ________ >> >> Vadym, >> >> Thanks for the feedback. I do test this scenario and it works for me: >> >> [r...@cast flatiron]# service corosync start >> Starting Corosync Cluster Engine (corosync): [ OK ] >> [r...@cast flatiron]# service corosync restart >> Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ] >> Waiting for corosync services to unload:. [ OK ] >> Starting Corosync Cluster Engine (corosync): [ OK ] >> [r...@cast flatiron]# service corosync stop >> Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ] >> Waiting for corosync services to unload:. [ OK ] >> [r...@cast flatiron]# service corosync start >> Starting Corosync Cluster Engine (corosync): [ OK ] >> [r...@cast flatiron]# /etc/init.d/corosync restart >> Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ] >> Waiting for corosync services to unload:. [ OK ] >> Starting Corosync Cluster Engine (corosync): [ OK ] >> >> >> One thing that would stop corosync from shutting down is if it couldn't >> enter operational state. This often happens because of a firewall enabled >> on the ports corosync uses to communicate. >> >> The system logs would be helpful (with debug: on). >> >> Regards >> -steve > > > And it works fine on Intel based servers, but on Redhat PPC based > server it doesn't > > I attached the config and the log file > > Thanks, > Vadym
Nothing jumps out from the logs. Thanks for the pointer about ppc. I'll hunt down some PPC hardware and see if I can reproduce/fix. Could you be more specific about which ppc (32 or 64) you were running? Where you running BE and LE in same cluster? Please be patient, however. I don't have any ppc hardware personally, and getting access to non-x86 hardware may take me a few days. Regards -steve _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
