DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=17516>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=17516 Thread safety problems in .../util/ and ../util/regx Summary: Thread safety problems in .../util/ and ../util/regx Product: Xerces-C++ Version: 2.1.0 Platform: Sun OS/Version: Solaris Status: NEW Severity: Major Priority: Other Component: Validating Parser (Schema) (Xerces 1.5 or up only) AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Hi all, Firstly a quick caveat: While I'm only running under 2.1.0, I've checked 2.2.0 at cvs.apache.org and the faults I've found in the code appear to still exist. But, since I haven't actually compiled and run under 2.2.0, I'm more than happy to be proved wrong... I've written an application which contains multiple DOM parsers each running in a seperate thread, each doing full validation using a schema that contains regex facets. When this application is run on a multi-processor host (SunFire 880) fatal errors (core dumps, assertions, etc) are regularly observed. >From my investigations I believe the fault is at least partially due to errors in the thread-saftey of the RangeTokenMap class (.../util/regx/RangeTokenMap.*). The single static RangeTokenMap object is created in the first call to the static RangeTokenMap::instance() method. The race condition is supposedly handled by the compareAndSwap() method which initialises the static fInstance pointer only if it's NULL. Even though compareAndSwap is mutex protected, by this stage it's already too late. Because the race condition results in double construction of the RangeTokenMap object, the second instance has to be deleted. Unfortunately the destructor contains a line setting the fInstance pointer back to NULL. Since fInstance is a static pointer, this clobbers the RangeTokenMap we actually want to use. So, by the time we come to return fInstance from RangeTokenMap::instance() it is useless... I replaced the implementation of RangeTokenMap::instance() with a version that only constructs the object once... RangeTokenMap* RangeTokenMap::instance() { static XMLRegisterCleanup instanceCleanup; if( !fInstance ) { XMLMutexLock lockInit( &fMutex ); if( !fInstance ) { fInstance = new RangeTokenMap(); instanceCleanup.registerCleanup( reinitInstance ); } } return (fInstance); } Obviously this change meant fMutex needed to become static. I also added mutex protection for the other methods of RangeTokenMap, including moving the locking in getRange() to the start of the method. These extra locks may have been overkill, but since the methods didn't appear to be obviously const I thought it was probably prudent. Can someone who knows the intended uses of RangeTokenMap better than I do (i.e. at all) advise me otherwise? After rebuilding Xerces with these fixes, the fatal errors became less frequent, but didn't disappear completely. I grepped around in the source a bit more, and I found the RangeTokenMap approach to constructing static objects was reused in the EncodingValidator class (.../util/EncodingValidator.*). I made the same single construction and method mutexing changes there as well. Finally tracked another problem to the use of the RangeTokenMap in the TokenFactory class (.../util/regx/TokenFactory.*). The TokenFactory::initializeRegistry() method calls multiple methods of the (static) RangeTokenMap object, but only if its fRangeInitialized flag is false. Since fRangeInitialized wasn't static, every thread that came to initialise the TokenFactory did the RangeTokenMap initialisation again. I don't fully recall why this multiple initialisation was harmful, as opposed to just unecessary, since it's a little while since I was looking at this problem. The fix I made was to add a static mutex to TokenFactory and made the fRangeInitialized flag static. The mutex is locked before attempting anything inside TokenFactory::initializeRegistry() After making all three of these changes I haven't had any further problems with my app. But, since most of the changes were made without any real understanding of what these objects actually do, I have a suspicion that I may have let myself in for some more subtle problems in the future. Unfortunately I can't provide any test code for these bugs, since my app is too tightly bound to our code base to be useful to anyone else and I don't have time to create a fully portable version. Besides which, I doubt that test code would necessarily guarantee these issues were excercised anyway... Hopefully I've provided enough explanation to illustrate the problem without example code. Thanks for your time... Cheers, Andrew --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
