On Tue, Jun 22, 2010 at 2:42 PM, Steven Dake <[email protected]> wrote: > On 06/22/2010 11:31 AM, Vadym Chepkov wrote: >> >> On Tue, Jun 22, 2010 at 2:21 PM, Steven Dake<[email protected]> wrote: >>> >>> On 06/22/2010 11:07 AM, Vadym Chepkov wrote: >>>> >>>> On Tue, Jun 22, 2010 at 1:49 PM, Steven Dake<[email protected]> wrote: >>>>> >>>>> On 06/22/2010 03:56 AM, Vadym Chepkov wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I decided to check if I can start using corosync again on several of >>>>>> my clusters (have to use heartbeat there at the moment). >>>>>> I don't even have any services defined in corosync.conf, commented >>>>>> pacemaker out, just plain corosync and it never goes down: >>>>>> >>>>>> # ps axf|grep corosync >>>>>> 26294 pts/0 S+ 0:00 | \_ /bin/sh /sbin/service >>>>>> corosync restart >>>>>> 26299 pts/0 S+ 0:01 | \_ /bin/bash >>>>>> /etc/init.d/corosync restart >>>>>> 29249 pts/1 S+ 0:00 \_ grep corosync >>>>>> 25959 ? Ssl 0:00 corosync >>>>>> >>>>>> >>>>>> I attached to the process and this is where it hangs: >>>>>> >>>>>> (gdb) where >>>>>> #0 0x0fe14134 in poll () from /lib/libc.so.6 >>>>>> #1 0x0ffbc530 in poll_run (handle=150346236434579456) at >>>>>> coropoll.c:413 >>>>>> #2 0x10006e50 in main (argc=<value optimized out>, argv=<value >>>>>> optimized out>) at main.c:1576 >>>>>> >>>>>> How can I help to debug this problem? >>>>>> It is 100% reproducible. >>>>>> >>>>>> Thank you, >>>>>> Vadym >>>>>> ________ >>>>> >>>>> Vadym, >>>>> >>>>> Thanks for the feedback. I do test this scenario and it works for me: >>>>> >>>>> [r...@cast flatiron]# service corosync start >>>>> Starting Corosync Cluster Engine (corosync): [ OK ] >>>>> [r...@cast flatiron]# service corosync restart >>>>> Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ] >>>>> Waiting for corosync services to unload:. [ OK ] >>>>> Starting Corosync Cluster Engine (corosync): [ OK ] >>>>> [r...@cast flatiron]# service corosync stop >>>>> Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ] >>>>> Waiting for corosync services to unload:. [ OK ] >>>>> [r...@cast flatiron]# service corosync start >>>>> Starting Corosync Cluster Engine (corosync): [ OK ] >>>>> [r...@cast flatiron]# /etc/init.d/corosync restart >>>>> Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ] >>>>> Waiting for corosync services to unload:. [ OK ] >>>>> Starting Corosync Cluster Engine (corosync): [ OK ] >>>>> >>>>> >>>>> One thing that would stop corosync from shutting down is if it couldn't >>>>> enter operational state. This often happens because of a firewall >>>>> enabled >>>>> on the ports corosync uses to communicate. >>>>> >>>>> The system logs would be helpful (with debug: on). >>>>> >>>>> Regards >>>>> -steve >>>> >>>> >>>> And it works fine on Intel based servers, but on Redhat PPC based >>>> server it doesn't >>>> >>>> I attached the config and the log file >>>> >>>> Thanks, >>>> Vadym >>> >>> Nothing jumps out from the logs. Thanks for the pointer about ppc. I'll >>> hunt down some PPC hardware and see if I can reproduce/fix. Could you be >>> more specific about which ppc (32 or 64) you were running? Where you >>> running BE and LE in same cluster? >>> >>> Please be patient, however. I don't have any ppc hardware personally, >>> and >>> getting access to non-x86 hardware may take me a few days. >> >> That's why I offered to help, since I have access to the PPC and it's >> in my best interests :) >> >> The kernel is ppc64, but most of the utilities are 32-bit, that's how >> Redhat ships PPC. >> I compiled 32-bit corosync, anyway. Both machines have identical >> kernel, so they can't >> have different byte order. >> >> Thanks, >> Vadym > > Without shell access, it is pretty difficult to know exactly what goes wrong > on a different byte architecture. > > We have spent significant time in the past making corosync work well on > be/le but occasionally new changes break existing archs. >
I can't provide your with shell access, unfortunately, but I can give you any info you might need: $ setarch ppc gcc -E -dM - < /dev/null |grep ENDIAN #define __BIG_ENDIAN__ 1 #define _BIG_ENDIAN 1 Thanks, Vadym _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
