[EMAIL PROTECTED] wrote:

> 
> I have glibc-2.1.2 also.  That is the version I have seen mentioned in
> connection with this.  RH 7.0 has a later glibc, does it not?
> I guess I should start looking for a patch path forward from 2.1.2.

I think I've found the problem.  It seems to be a bug in glibc 2.1.2. 
If a dlopen() call fails, it sets an internal variable to point to a 
string describing the error (for later retrieval with dlerror()).  If 
you then call dlopen() again, without calling dlerror() in between to 
clear the error status, dlopen() will free the string.  Then it tries to 
load the library, and sets the internal variable again depending on 
whether the open succeeded or failed.  However, during the time that its 
trying to load the library, it keeps this internal variable pointing to 
a now-freed block of memory.  As we call dlopen() again before the 
loading of the library finishes, it tries to free the same memory twice. 
  Here is a bit of the code from glibc 2.1.2:

---
   if (result->errstring != NULL)
     /* Free the error string from the last failed command.  This can
        happen if `dlerror' was not run after an error was found.  */
     free (result->errstring);

   result->errcode = _dl_catch_error (&result->errstring, operate, args);
---

_dl_catch_error ends up eventually recursively invoking this same 
function, and, result->errstring hasn't been reset yet.  The code from 
glibc 2.2 looks like:

---
   if (result->errstring != NULL)
     {
       /* Free the error string from the last failed command.  This can
          happen if `dlerror' was not run after an error was found.  */
       if (strcmp (result->errstring, "out of memory") != 0)
         free ((char *) result->errstring);
       result->errstring = NULL;
     }

   result->errcode = _dl_catch_error (&result->objname, &result->errstring,
                                      operate, args);
---

Where they seemed to have addressed this issue.

Short of forcing everyone to upgrade to glibc 2.2, we can work around 
this issue in Wine.  We can make sure that we call dlerror() after every 
(failed) dlopen() call, even if we throw away the return.  This will 
clear the error state, and everything should work well.  Another option 
is to simply call dlerror() immediately before any dlopen() call to make 
sure that the error state is clear before going in.  The first option 
seems cleaner.  However, its complicated by the fact that there are 
dlopen() calls in multiple places, and sometimes the callers assume that 
dlerror() will return meaningful results after the failure of a 
wine-internal routine that calls dlopen().  We have ELFDLL_dlopen() and 
wine_dll_load() where callers can use dlerror() after return.  There is 
also wine_dll_load_main_exe() where callers apparently don't call 
dlerror().  And then there is a dlopen() call off by its lonesome in 
dlls/odbc32/proxyodbc.c which doesn't call dlerror(), and which seems to 
be missing the HAVE_DL_API protection (unless this isn't compiled at all 
in that case?).

Is there any planned direction for this code?  For instance, is the 
stuff in loader/elfdll.c going to go away now?  Perhaps if we could 
isolate all the calls to dlopen() in one place, we could use a better 
error reporting mechanism that using dlerror().  Doing it the way we do 
now requires HAVE_DL_API checks around to see if we can safely call 
dlerror().  And its prone to problems as putting dlerror() calls in 
lower level routines can break routines that are higher up.  We could 
save the return from dlerror() and then give it out to anyone who asked 
for it.  Of course, this would have to be made thread safe, which is a 
pain.  Or we could make the functions that call dlerror() take an extra 
parameter, char **error, and put if (error) *error = dlerror(); inside.

Thoughts?

(BTW, inserting a dlerror() call right before
     if (!ret) ret = dlopen( buffer + dll_path_maxlen + 1, RTLD_NOW );
in library/loader.c hacks around the problem for me for now).


-- 
James Abbatiello


Reply via email to