Tommi, We've been using tntnet extensively for the past year. Thank you for your efforts.
A fairly troublesome problem has appeared and we've spent countless hours over many months trying to figure out what it was. Part of the reason why it was so difficult to track down was because the problem only appears when running two web applications at the same time and only occasionally. In the end we figured out how to reproduce the problem with 100% certainty on our test server. The source of the problem is the RTLD_GLOBAL setting in your dlopen command running on Linux. Let me explain: We have two versions of a web application, AppA and AppB. The two applications are almost identical. Both use shared libraries that are very similar (almost identical) except for how recent the version is for the two applications. So, AppA uses shared LibA and AppB uses shared LibB. Both applications are running on one server via one tntnet service. Here is the problem as far as we can observe: - Restart tntnet - Hit AppA, LibA is used and appears to be cached - Hit AppB and instead of it using LibB, the cached LibA is used. It does not matter if LibB and LibA have different file names (libLibB.so and libLibA.so for example) nor does it matter if they have different major version numbers, the wrong library is used. If one restarts tntnet and only hits AppA and never AppB then LibA is always used. Again, restart tntnet and hit AppB and only LibB is used. The converse of the problem scenario is also observed: - Restart tntnet - Hit AppB, LibB is cached - Hit AppA and instead of it using LibA, LibB is used. LibA and LibB have essentially the same class and function names (same symbols) but they have different file names and different version numbers yet it appears that the operating system does not know to not use the wrong cached library. Needless to say this surprises us to no end. We are running Linux kernel 2.6.18-5-686 with cxxtools 1.4.8 and tntnet 1.6.3. Another potentially related problem is with threading but I won't get into it just now. It appears that what should be private data structures from shared libraries are shared globally across all threads. This problem is not as well understood as the one described above but is likely related. See the end of this email for more details. Solution: Changing RTLD_GLOBAL to RTLD_LOCAL in dlloader.cpp solves the problem 100% of the time. Returning the code to RTLD_GLOBAL causes the problem to return 100% of the time. For our applications there does not appear to be a problem with using RTLD_LOCAL. Even standard exception handling functions properly across dlopened files. Question: Why do you enable RTLD_GLOBAL? Is it critical for certain features in tntnet and if so, which? Is it used in anticipation of something required by an application that may use tntnet? What would that scenario be? Suggestion: Make RTLD_LOCAL a configuration option in tntnet.conf and make the option RTLD_LOCAL by default. I look forward to your input on this issue. Regards, Paul Here is a reference to the problem: "dlopen(..., RTLD_GLOBAL) pollutes the global namespace with symbols defined in this module further dlopen/dlsym involves resolving started from previously loaded modules. This way some symbols get resolved the wrong way The correct way (as expected by the user) is using RTLD_LOCAL. RTLD_LOCAL behaves the same way as LoadLibrary() under Windows." http://bugzilla.gnome.org/show_bug.cgi?id=71615 [Problem with threads and RTLD_GLOBAL] Dynamically link a shared lib to one of the "components" (eccp code). In the shared lib, there's one or more exported data structures that are globally defined and instantiated inside the implementation of the shared lib. With RTLD_GLOBAL each worker thread unexpectedly will have access to copies of the exported data structures as if they were scoped globally across all threads (as opposed to multiple private instances scoped local to each thread as one would normally expect), and this causes serious threading problems when what is expected to be private data structures belonging to each thread are actually being shared globally across all threads. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Tntnet-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tntnet-general
