On Tue, Feb 2, 2010 at 3:17 PM, hj lee <[email protected]> wrote:

>
>> Thank you very much for moving forward this issue. The stacktrace I posted
>> was created with timestamd off. The strftime was called by syslog() call.
>> Inside syslog, it calls strftime to get the time. Now I have a clear story
>> how this seg fault happens. It happens very rarely, not easy to reproduce.
>>
>> As I I guessed, this seg fault was caused by setenv() which is called
>> pcmk_startup(). These are those calls grepped from pacemaker source.
>> 5 plugin.c        328 setenv("HA_debug",  "1", 1);
>> 6 plugin.c        332 setenv("HA_debug",  "0", 1);
>> 7 plugin.c        349 setenv("HA_logfile",  value, 1);
>> 8 plugin.c        354 setenv("HA_logfacility",  value, 1);
>> 9 plugin.c        355 setenv("HA_LOGFACILITY",  value, 1);
>> a plugin.c        374 setenv("HA_use_logd", value, 1);
>> b plugin.c        514 setenv("HA_COMPRESSION",  "bz2", 1);
>> c plugin.c        515 setenv("HA_cluster_type", "openais", 1);
>>
>>
>> The setenv() in glibc does realloc current environment variable
>> (__environ) when new environment variable is added. The pcmk is adding 8 new
>> variables while it is initialized.
>>
>> Here are a part of setenv() code from
>> glibc-debuginfo-common-2.5-34.i386.rpm. The last_environ is a copy of
>> __environ.
>> 141       /* We allocated this space; we can extend it.  */
>> 142       new_environ = (char **) realloc (last_environ,
>> 143                                        (size + 2) * sizeof (char *));
>>
>>
>> The getenv() of glibc library copies __environ variable and loop it to
>> find "TZ" environment variable. Here is the getenv() code.
>>
>>  81       for (ep = __environ; *ep != NULL; ++ep)
>>  82         {
>>  83 #if _STRING_ARCH_unaligned
>>  84           uint16_t ep_start = *(uint16_t *) *ep;
>>  85 #else
>>  86           uint16_t ep_start = (((unsigned char *) *ep)[0]
>>  87                                | (((unsigned char *) *ep)[1] << 8));
>>  88 #endif
>>  89
>>  90           if (name_start == ep_start && !strncmp (*ep + 2, name, len)
>>  91               && (*ep)[len + 2] == '=')
>>  92             return &(*ep)[len + 3];
>>  93         }
>>
>> Here is very top of my stacktrace:
>> (gdb) bt
>>
>> #0  0x0013b7ee in getenv (name=<value optimized out>) at getenv.c:90
>> #1  0x00193cf0 in tzset_internal (always=<value optimized out>,
>> explicit=<value optimized out>) at tzset.c:162
>> #2  0x0019480d in __tzset () at tzset.c:543
>>
>> The seg fault exactly happens when getenv() is scheduled out between line
>> 82 - 89, and setenv() changes __environ variable, and getenv() is scheduled
>> again. Then getenv() has the old pointer that was freed by setenv(). Seg
>> fault!
>>
>> So I stongely suggest two proposals I posted eariler.
>> 1. Delay creating logsys thread after pcmk_startup() call
>> 2. Remove most(hopefuly all of them) of setenv() in pcmk_startup(). If
>> they are needed, then administrator can export it in the shell.
>>
>> Because setenv() is not a safe code as you can see, it should not be used.
>>
>> Thank very much.
>> hj
>>
>
> Hi,
>
> Obviously this is a glibc issue. The corosync and pacemaker did nothing
> wrong.  The two proposal are just work around, not fixing the root cause in
> glibc. The glibc can fix this issue by adding LOCK in getenv() function. The
> setenv() has LOCK/UNLOCK on beginning and on exiting. Add the same lock to
> getenv() will prevent this issue. I am not sure how long it will take to fix
> glibc or how hard. but in the meantime, corosync/pacemaker can use wrapper
> functions for get/setenv() which will lock for get/setenv() with pthread
> mutex, so that only one thread can get into get/setenv(). This may be a
> better way to address this issue in corosync/pacemaker.
>
> I want to hear your opinion since this is one of release blocking issue.
> Thank very much
>

Hi,

Think again. Sorry, the wrapper won't work. Because getenv() is called
inside from syslog().

Thanks
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to