I was never able to exactly reproduce the problem, but I think I came close. My attempt is in a temporary branch called "yamwb" (Yet Another MainWin Bug) on GitHub:
git clone https://github.com/VirtualGL/virtualgl.git cd virtualgl git checkout yamwb git checkout HEAD~1 The penultimate commit in that branch (https://github.com/VirtualGL/virtualgl/commit/ea45a4121aeb26e70ff3da3c0dcf4af1f156e031) creates a shared library with _init() and _fini() functions, calls XGetSelectionOwner() in the body of _fini(), and links the shared lib with GLXspheres. When running this modified version of GLXspheres in VGL, it doesn't actually lock up, but it does demonstrate a failure in CriticalSection::lock() that occurs when one of VirtualGL's interposed functions is called from another shared lib's global destructor. Basically, at that point in the execution, we can't rely on any mutexes except the global one, and that could very well be what's causing MainWin to lock up. Basically, the issue is that, by the time the shared lib's destructor function is called, the GlobalCleanup destructor in VGL's faker has already been called (and if the faker had a global destructor function, it would have already been called as well.) At that point, it is difficult or impossible for VGL to operate with any semblance of normalcy, particularly given that mutexes don't work properly. So how is VGL supposed to sanely handle an application calling an interposed X11 or OpenGL function after the interposer itself has been essentially shut down? If we're lucky and this is just confined to the XCB interposer, meaning that fixing it is a simple matter of disabling said interposer, then do git checkout yamwb to see my proposed solution (https://github.com/VirtualGL/virtualgl/commit/5328fe5c0d725b4b04c926aaf53eb657548a9028). Symptomatically, what happens is as follows: (1) GLXspheres returns from main(). (2) GlobalCleanup::~GlobalCleanup() is called in the faker. (3) _fini() is called in the shared library. (4) _fini() calls XGetSelectionOwner() [not interposed]. (5) XGetSelectionOwner() calls xcb_poll_for_event() [interposed]. (6) The interposed xcb_poll_for_event() function attempts to access fconfig to read the status of fconfig.fakeXCB. (7) fconfig_instance() attempts to lock the mutex guarding its singleton instance. (8) The CriticalSection lock fails and attempts to throw an error. NOTE: This is where MainWin locks up, but my modified version of GLXspheres doesn't. Rather, the error is caught by the catch() handler in xcb_poll_for_event(), safeExit() is called, and the application exits without returning from XGetSelectionOwner() or _fini(). But that's still incorrect behavior, because _fini() never returns. What the proposed solution does: -- It sets the faker level to 1 within the body of GlobalCleanup::~GlobalCleanup(), effectively disabling any further XCB interposition. -- It re-arranges the if() statements within faker-xcb.cpp so that the faker level is checked prior to attempting to access the FakerConfig singleton. That eliminates the problem with my test application, but you'll have to tell me whether it fixes the problem with MainWin or not. DRC On 7/11/16 4:34 PM, Nathan Kidd wrote: > On 11/07/16 03:54 PM, DRC wrote: >> I need to be able to reproduce this before I can fix it, but my attempts >> to reproduce it by adding a destructor to GLXspheres and calling >> XGetSelectionOwner() within the destructor failed. I also tried calling >> xcb_poll_for_event() within the body of my destructor function, but it >> just returned NULL without doing anything. > > Does it make a difference if you try to do X things from a separate SO's > _fini()? > >> Please help me understand exactly what's going on here and how I can >> reproduce the problem without using MainWin. > > Ha, "reproduce the problem without using MainWin", the story of my life. > It's going to take some time before I'll get a chance to try. ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev _______________________________________________ VirtualGL-Devel mailing list VirtualGL-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtualgl-devel