On 06/22/2010 11:31 AM, Vadym Chepkov wrote:
> On Tue, Jun 22, 2010 at 2:21 PM, Steven Dake<[email protected]>  wrote:
>> On 06/22/2010 11:07 AM, Vadym Chepkov wrote:
>>>
>>> On Tue, Jun 22, 2010 at 1:49 PM, Steven Dake<[email protected]>    wrote:
>>>>
>>>> On 06/22/2010 03:56 AM, Vadym Chepkov wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I decided to check if I can start using corosync again on several of
>>>>> my clusters (have to use heartbeat there at the moment).
>>>>> I don't even have any services defined in corosync.conf, commented
>>>>> pacemaker out, just plain corosync and it never goes down:
>>>>>
>>>>> # ps axf|grep corosync
>>>>> 26294 pts/0    S+     0:00  |               \_ /bin/sh /sbin/service
>>>>> corosync restart
>>>>> 26299 pts/0    S+     0:01  |                   \_ /bin/bash
>>>>> /etc/init.d/corosync restart
>>>>> 29249 pts/1    S+     0:00                  \_ grep corosync
>>>>> 25959 ?        Ssl    0:00 corosync
>>>>>
>>>>>
>>>>> I attached to the process and this is where it hangs:
>>>>>
>>>>> (gdb) where
>>>>> #0  0x0fe14134 in poll () from /lib/libc.so.6
>>>>> #1  0x0ffbc530 in poll_run (handle=150346236434579456) at coropoll.c:413
>>>>> #2  0x10006e50 in main (argc=<value optimized out>, argv=<value
>>>>> optimized out>) at main.c:1576
>>>>>
>>>>> How can I help to debug this problem?
>>>>> It is 100% reproducible.
>>>>>
>>>>> Thank you,
>>>>> Vadym
>>>>> ________
>>>>
>>>> Vadym,
>>>>
>>>> Thanks for the feedback.  I do test this scenario and it works for me:
>>>>
>>>> [r...@cast flatiron]# service corosync start
>>>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>>>> [r...@cast flatiron]# service corosync restart
>>>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>>>> Waiting for corosync services to unload:.                  [  OK  ]
>>>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>>>> [r...@cast flatiron]# service corosync stop
>>>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>>>> Waiting for corosync services to unload:.                  [  OK  ]
>>>> [r...@cast flatiron]# service corosync start
>>>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>>>> [r...@cast flatiron]# /etc/init.d/corosync restart
>>>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>>>> Waiting for corosync services to unload:.                  [  OK  ]
>>>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>>>>
>>>>
>>>> One thing that would stop corosync from shutting down is if it couldn't
>>>> enter operational state.  This often happens because of a firewall
>>>> enabled
>>>> on the ports corosync uses to communicate.
>>>>
>>>> The system logs would be helpful (with debug: on).
>>>>
>>>> Regards
>>>> -steve
>>>
>>>
>>> And it works fine on Intel based servers, but on Redhat PPC based
>>> server it doesn't
>>>
>>> I attached the config and the log file
>>>
>>> Thanks,
>>> Vadym
>>
>> Nothing jumps out from the logs.  Thanks for the pointer about ppc. I'll
>> hunt down some PPC hardware and see if I can reproduce/fix.  Could you be
>> more specific about which ppc (32 or 64) you were running?  Where you
>> running BE and LE in same cluster?
>>
>> Please be patient, however.  I don't have any ppc hardware personally, and
>> getting access to non-x86 hardware may take me a few days.
>
> That's why I offered to help, since I have access to the PPC and it's
> in my best interests :)
>
> The kernel is ppc64, but most of the utilities are 32-bit, that's how
> Redhat ships PPC.
> I compiled 32-bit corosync, anyway. Both machines have identical
> kernel, so they can't
> have different byte order.
>
> Thanks,
> Vadym

Without shell access, it is pretty difficult to know exactly what goes 
wrong on a different byte architecture.

We have spent significant time in the past making corosync work well on 
be/le but occasionally new changes break existing archs.

Regards
-steve
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to