Tommi,

We've been using tntnet extensively for the past year. Thank you for 
your efforts.

A fairly troublesome problem has appeared and we've spent countless 
hours over many months trying to figure out what it was. Part of the 
reason why it was so difficult to track down was because the problem 
only appears when running two web applications at the same time and only 
occasionally.

In the end we figured out how to reproduce the problem with 100% 
certainty on our test server. The source of the problem is the 
RTLD_GLOBAL setting in your dlopen command running on Linux.

Let me explain:

We have two versions of a web application, AppA and AppB. The two 
applications are almost identical. Both use shared libraries that are 
very similar (almost identical) except for how recent the version is for 
the two applications. So, AppA uses shared LibA and AppB uses shared LibB.

Both applications are running on one server via one tntnet service.

Here is the problem as far as we can observe:

- Restart tntnet
- Hit AppA, LibA is used and appears to be cached
- Hit AppB and instead of it using LibB, the cached LibA is used. It 
does not matter if LibB and LibA have different file names (libLibB.so 
and libLibA.so for example) nor does it matter if they have different 
major version numbers, the wrong library is used.

If one restarts tntnet and only hits AppA and never AppB then LibA is 
always used. Again, restart tntnet and hit AppB and only LibB is used.

The converse of the problem scenario is also observed:

- Restart tntnet
- Hit AppB, LibB is cached
- Hit AppA and instead of it using LibA, LibB is used.

LibA and LibB have essentially the same class and function names (same 
symbols) but they have different file names and different version 
numbers yet it appears that the operating system does not know to not 
use the wrong cached library. Needless to say this surprises us to no 
end. We are running Linux kernel 2.6.18-5-686 with cxxtools 1.4.8 and 
tntnet 1.6.3.

Another potentially related problem is with threading but I won't get 
into it just now.  It appears that what should be private data 
structures from shared libraries are shared globally across all threads. 
This problem is not as well understood as the one described above but is 
likely related. See the end of this email for more details.

Solution:

Changing RTLD_GLOBAL to RTLD_LOCAL in dlloader.cpp solves the problem 
100% of the time. Returning the code to RTLD_GLOBAL causes the problem 
to return 100% of the time.

For our applications there does not appear to be a problem with using 
RTLD_LOCAL. Even standard exception handling functions properly across 
dlopened files.

Question: Why do you enable RTLD_GLOBAL? Is it critical for certain 
features in tntnet and if so, which? Is it used in anticipation of 
something required by an application that may use tntnet? What would 
that scenario be?

Suggestion: Make RTLD_LOCAL a configuration option in tntnet.conf and 
make the option RTLD_LOCAL by default.

I look forward to your input on this issue.

Regards,
Paul


Here is a reference to the problem:

"dlopen(..., RTLD_GLOBAL)
pollutes the global namespace with symbols defined in this module
further dlopen/dlsym involves resolving started from previously loaded
modules. This way some symbols get resolved the wrong way
The correct way (as expected by the user) is using RTLD_LOCAL.
RTLD_LOCAL behaves the same way as LoadLibrary() under Windows."

http://bugzilla.gnome.org/show_bug.cgi?id=71615


[Problem with threads and RTLD_GLOBAL]
Dynamically link a shared lib to one of the "components" (eccp code). In 
the shared lib, there's one or more exported data structures that are 
globally defined and instantiated inside the implementation of the 
shared lib. With RTLD_GLOBAL each worker thread unexpectedly will have 
access to copies of the exported data structures as if they were scoped 
globally across all threads (as opposed to multiple private instances 
scoped local to each thread as one would normally expect), and this 
causes serious threading problems when what is expected to be private 
data structures belonging to each thread are actually being shared 
globally across all threads.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Tntnet-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tntnet-general

Reply via email to