On Tue, 2010-02-02 at 13:36 -0700, hj lee wrote: > > > On Tue, Feb 2, 2010 at 12:47 PM, Steven Dake <[email protected]> wrote: > > On Tue, 2010-02-02 at 12:50 -0700, hj lee wrote: > > > > > > On Tue, Feb 2, 2010 at 11:58 AM, Steven Dake > <[email protected]> wrote: > > On Tue, 2010-02-02 at 09:00 -0700, hj lee wrote: > > > Hi, > > > > > > There is still a seg fault when corosync starts. I > am using > > > corosync-1.1.2 epel version on CentOS 5.3. Here is > the stack > > trace > > > from the core file. > > > > > > There are numerious program using syslog in Linux, > they are > > OK. Why is > > > the corosync so vulnerable to syslog? This seg > fault on > > getenv() > > > always happens by logging worker thread when > pcmk_startup() > > is called. > > > I think this seg fault is caused by many setenv() > calls in > > > pcmk_startup(). So I suggest two ways of fixing. > > > > > > 1. Delay creating logsys worker thread until > pcmk_startup() > > finished. > > > This can be done by moving logsys_fork_completed() > to the > > end of > > > main_service_ready(). > > > 2. Remove all the setenv() in pcmk_startup and > export it in > > shell. > > > > > > How do you think? > > > > > > > > > This problem is fixed in revision 2626 of the > flatiron branch > > (which is > > released in corosync-1.2.0). This problem remains > in > > corosync-1.1.2. > > > > The root of the issue is that corosync was using > > non-async-signal safe > > posix api calls within signal handlers. That has > been > > corrected. > > > > > > Hi again, > > > > Would you tell me how to see the log and diff of revision > 2626? I had > > corosync svn trunk and did "svn diff -r 2625:2626", it > returns > > nothing. > > > > > That patch is in flatiron branch, not the trunk branch. The > trunk > revision is a different revision number. To find the revision > number i > mentioned, look in the flatiron branch. > > cd branches/flatiron > svn diff -r 2625:2626 > revision-2626.patch > > Thank you for you info. I looked the diff. But still I think my seg > fault is not fixed by that patch. I turned off timestamp in > corosync.conf, so strftime is never got called in my test. > > Thanks > hj > > Thanks for your persistence. The original backtrace you reported is a direct result of timestamp:on and the aforementioned patch (stack frame #3 calls strftime which only executes with timestamp: on). You mentioned it still crashed with timestamps turned off so I took further investigation of the code. According to pthreads man page, getenv/setenv are not thread safe either. It is possible there is some thread related issue with pacemaker integration and the getenv calls within corosync. Could you provide a backtrace of your crash with timestamp set to off? Seems to work in my environment.
Regards -steve _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
