On Tue, 2010-02-02 at 13:36 -0700, hj lee wrote:
> 
> 
> On Tue, Feb 2, 2010 at 12:47 PM, Steven Dake <[email protected]> wrote:
>         
>         On Tue, 2010-02-02 at 12:50 -0700, hj lee wrote:
>         >
>         >
>         > On Tue, Feb 2, 2010 at 11:58 AM, Steven Dake
>         <[email protected]> wrote:
>         >         On Tue, 2010-02-02 at 09:00 -0700, hj lee wrote:
>         >         > Hi,
>         >         >
>         >         > There is still a seg fault when corosync starts. I
>         am using
>         >         > corosync-1.1.2 epel version on CentOS 5.3. Here is
>         the stack
>         >         trace
>         >         > from the core file.
>         >         >
>         >         > There are numerious program using syslog in Linux,
>         they are
>         >         OK. Why is
>         >         > the corosync so vulnerable to syslog? This seg
>         fault on
>         >         getenv()
>         >         > always happens by logging worker thread when
>         pcmk_startup()
>         >         is called.
>         >         > I think this seg fault is caused by many setenv()
>         calls in
>         >         > pcmk_startup(). So I suggest two ways of fixing.
>         >         >
>         >         > 1. Delay creating logsys worker thread until
>         pcmk_startup()
>         >         finished.
>         >         > This can be done by moving logsys_fork_completed()
>         to the
>         >         end of
>         >         > main_service_ready().
>         >         > 2. Remove all the setenv() in pcmk_startup and
>         export it in
>         >         shell.
>         >         >
>         >         > How do you think?
>         >         >
>         >
>         >
>         >         This problem is fixed in revision 2626 of the
>         flatiron branch
>         >         (which is
>         >         released in corosync-1.2.0).  This problem remains
>         in
>         >         corosync-1.1.2.
>         >
>         >         The root of the issue is that corosync was using
>         >         non-async-signal safe
>         >         posix api calls within signal handlers.  That has
>         been
>         >         corrected.
>         >
>         >
>         > Hi again,
>         >
>         > Would you tell me how to see the log and diff of revision
>         2626? I had
>         > corosync svn trunk and did "svn diff -r 2625:2626", it
>         returns
>         > nothing.
>         >
>         
>         
>         That patch is in flatiron branch, not the trunk branch.  The
>         trunk
>         revision is a different revision number.  To find the revision
>         number i
>         mentioned, look in the flatiron branch.
>         
>         cd branches/flatiron
>         svn diff -r 2625:2626 > revision-2626.patch
>         
> Thank you for you info. I looked the diff. But still I think my seg
> fault is not fixed by that patch. I turned off timestamp in
> corosync.conf, so strftime is never got called in my test.
> 
> Thanks
> hj
> 
> 
Thanks for your persistence.  The original backtrace you reported is a
direct result of timestamp:on and the aforementioned patch (stack frame
#3 calls strftime which only executes with timestamp: on).  You
mentioned it still crashed with timestamps turned off so I took further
investigation of the code.  According to pthreads man page,
getenv/setenv are not thread safe either.  It is possible there is some
thread related issue with pacemaker integration and the getenv calls
within corosync.  Could you provide a backtrace of your crash with
timestamp set to off?  Seems to work in my environment.

Regards
-steve


_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to