Re: [osg-users] Multi-threading bug related to Registry
Robert, Note that Rick responded to your question although to the osg-users Digest mail so it will probably not show as a follow up in your e-mail box. See http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2 008-November/019505.html Sorry for pushing you around, but I happen to be interested in the outcome of this thread. -- Roland -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Osfield Sent: dinsdag 25 november 2008 15:08 To: OpenSceneGraph Users Subject: Re: [osg-users] Multi-threading bug related to Registry Hi Rick, Just to confirm the issue, if you enable Registry caching of imagery and use paged databases that have external image files, then you get the below crash. If you don't use the Registry cache what happens? How reliably does this crash occur? My hypothisis is that the object cache is being cleared by another thread just at the wrong time for this particular section of .ive plugin. If it is correct then use of Registry::readImage(...) that returns a Registry::ReadResult rather than a Image* would be safe as ReadResult uses ref_ptr internally. Robert. On Tue, Nov 25, 2008 at 1:49 PM, Rick Appleton [EMAIL PROTECTED] wrote: Sorry for the delay on this. Here's some more information. Below is a complete callstack which should be similar to when the issue happened. Basically we are loading a Paged terrain database which uses IVE files for data, and DDS textures for images. A quick search in OSG reveals that osgDB::readImageFile is called from many plugin loaders, so I suspect they would have the same issue. osg48-osgDBd.dll!osgDB::Registry::readImageImplementation(cons t std::basic_stringchar,std::char_traitschar,std::allocatorc har fileName=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 1706C++ osg48-osgDBd.dll!osgDB::Registry::readImage(const std::basic_stringchar,std::char_traitschar,std::allocatorc har fileName=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 217 + 0x14 bytes C++ osg48-osgDBd.dll!osgDB::readImageFile(const std::basic_stringchar,std::char_traitschar,std::allocatorc har filename=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 43 + 0x1d bytes C++ osgdb_ived.dll!ive::DataInputStream::readImage(std::basic_stri ngchar,std::char_traitschar,std::allocatorchar filename=textures/TxMosaic.dds) Line 950 + 0x44 bytesC++ osgdb_ived.dll!ive::DataInputStream::readImage(ive::IncludeIma geMode mode=IMAGE_REFERENCE_FILE) Line 1003 + 0x2c bytes C++ osgdb_ived.dll!ive::DataInputStream::readImage() Line 976 + 0xc bytes C++ osgdb_ived.dll!ive::Texture2D::read(ive::DataInputStream * in=0x00dbec2c) Line 54 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readStateAttribute() Line 1159 + 0xc bytesC++ osgdb_ived.dll!ive::StateSet::read(ive::DataInputStream * in=0x00dbec2c) Line 173 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readStateSet() Line 1066 C++ osgdb_ived.dll!ive::Node::read(ive::DataInputStream * in=0x00dbec2c) Line 124 + 0x8 bytes C++ osgdb_ived.dll!ive::Geode::read(ive::DataInputStream * in=0x00dbec2c) Line 67 C++ osgdb_ived.dll!ive::DataInputStream::readNode() Line 1507 + 0x2d bytes C++ osgdb_ived.dll!ive::Group::read(ive::DataInputStream * in=0x00dbec2c) Line 74 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readNode() Line 1499 + 0xc bytes C++ osgdb_ived.dll!ReaderWriterIVE::readNode(std::basic_istreamch ar,std::char_traitschar fin={...}, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 98 + 0xb bytesC++ osgdb_ived.dll!ReaderWriterIVE::readNode(const std::basic_stringchar,std::char_traitschar,std::allocatorc har file=veluwe.ive, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 69 + 0x23 bytes C++ osg48-osgDBd.dll!osgDB::Registry::ReadNodeFunctor::doRead(osgD B::ReaderWriter rw={...}) Line 1374 + 0x40 bytes C++ osg48-osgDBd.dll!osgDB::Registry::read(const osgDB::Registry::ReadFunctor readFunctor={...}) Line 1527 + 0x22 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readImplementation(const osgDB::Registry::ReadFunctor readFunctor={...}, bool useObjectCache=true) Line 1596 + 0x13 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readNodeImplementation(const std::basic_stringchar,std::char_traitschar,std::allocatorc har fileName=veluwe.ive, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 1814 + 0x57 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readNode(const std::basic_stringchar,std::char_traitschar,std::allocatorc har
Re: [osg-users] Multi-threading bug related to Registry
HI Roland, I did read the reply, and I have already reponded. Robert. On Fri, Nov 28, 2008 at 8:19 AM, Smeenk, R.J.M. (Roland) [EMAIL PROTECTED] wrote: Robert, Note that Rick responded to your question although to the osg-users Digest mail so it will probably not show as a follow up in your e-mail box. See http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2 008-November/019505.html Sorry for pushing you around, but I happen to be interested in the outcome of this thread. -- Roland -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Osfield Sent: dinsdag 25 november 2008 15:08 To: OpenSceneGraph Users Subject: Re: [osg-users] Multi-threading bug related to Registry Hi Rick, Just to confirm the issue, if you enable Registry caching of imagery and use paged databases that have external image files, then you get the below crash. If you don't use the Registry cache what happens? How reliably does this crash occur? My hypothisis is that the object cache is being cleared by another thread just at the wrong time for this particular section of .ive plugin. If it is correct then use of Registry::readImage(...) that returns a Registry::ReadResult rather than a Image* would be safe as ReadResult uses ref_ptr internally. Robert. On Tue, Nov 25, 2008 at 1:49 PM, Rick Appleton [EMAIL PROTECTED] wrote: Sorry for the delay on this. Here's some more information. Below is a complete callstack which should be similar to when the issue happened. Basically we are loading a Paged terrain database which uses IVE files for data, and DDS textures for images. A quick search in OSG reveals that osgDB::readImageFile is called from many plugin loaders, so I suspect they would have the same issue. osg48-osgDBd.dll!osgDB::Registry::readImageImplementation(cons t std::basic_stringchar,std::char_traitschar,std::allocatorc har fileName=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 1706C++ osg48-osgDBd.dll!osgDB::Registry::readImage(const std::basic_stringchar,std::char_traitschar,std::allocatorc har fileName=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 217 + 0x14 bytes C++ osg48-osgDBd.dll!osgDB::readImageFile(const std::basic_stringchar,std::char_traitschar,std::allocatorc har filename=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 43 + 0x1d bytes C++ osgdb_ived.dll!ive::DataInputStream::readImage(std::basic_stri ngchar,std::char_traitschar,std::allocatorchar filename=textures/TxMosaic.dds) Line 950 + 0x44 bytesC++ osgdb_ived.dll!ive::DataInputStream::readImage(ive::IncludeIma geMode mode=IMAGE_REFERENCE_FILE) Line 1003 + 0x2c bytes C++ osgdb_ived.dll!ive::DataInputStream::readImage() Line 976 + 0xc bytes C++ osgdb_ived.dll!ive::Texture2D::read(ive::DataInputStream * in=0x00dbec2c) Line 54 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readStateAttribute() Line 1159 + 0xc bytesC++ osgdb_ived.dll!ive::StateSet::read(ive::DataInputStream * in=0x00dbec2c) Line 173 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readStateSet() Line 1066 C++ osgdb_ived.dll!ive::Node::read(ive::DataInputStream * in=0x00dbec2c) Line 124 + 0x8 bytes C++ osgdb_ived.dll!ive::Geode::read(ive::DataInputStream * in=0x00dbec2c) Line 67 C++ osgdb_ived.dll!ive::DataInputStream::readNode() Line 1507 + 0x2d bytes C++ osgdb_ived.dll!ive::Group::read(ive::DataInputStream * in=0x00dbec2c) Line 74 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readNode() Line 1499 + 0xc bytes C++ osgdb_ived.dll!ReaderWriterIVE::readNode(std::basic_istreamch ar,std::char_traitschar fin={...}, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 98 + 0xb bytesC++ osgdb_ived.dll!ReaderWriterIVE::readNode(const std::basic_stringchar,std::char_traitschar,std::allocatorc har file=veluwe.ive, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 69 + 0x23 bytes C++ osg48-osgDBd.dll!osgDB::Registry::ReadNodeFunctor::doRead(osgD B::ReaderWriter rw={...}) Line 1374 + 0x40 bytes C++ osg48-osgDBd.dll!osgDB::Registry::read(const osgDB::Registry::ReadFunctor readFunctor={...}) Line 1527 + 0x22 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readImplementation(const osgDB::Registry::ReadFunctor readFunctor={...}, bool useObjectCache=true) Line 1596 + 0x13 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readNodeImplementation(const std::basic_stringchar,std::char_traitschar,std::allocatorc har fileName=veluwe.ive, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 1814 + 0x57 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readNode(const std::basic_stringchar,std
Re: [osg-users] Multi-threading bug related to Registry
Sorry for the delay on this. Here's some more information. Below is a complete callstack which should be similar to when the issue happened. Basically we are loading a Paged terrain database which uses IVE files for data, and DDS textures for images. A quick search in OSG reveals that osgDB::readImageFile is called from many plugin loaders, so I suspect they would have the same issue. osg48-osgDBd.dll!osgDB::Registry::readImageImplementation(const std::basic_stringchar,std::char_traitschar,std::allocatorchar fileName=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 1706C++ osg48-osgDBd.dll!osgDB::Registry::readImage(const std::basic_stringchar,std::char_traitschar,std::allocatorchar fileName=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 217 + 0x14 bytes C++ osg48-osgDBd.dll!osgDB::readImageFile(const std::basic_stringchar,std::char_traitschar,std::allocatorchar filename=textures/TxMosaic.dds, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 43 + 0x1d bytes C++ osgdb_ived.dll!ive::DataInputStream::readImage(std::basic_stringchar,std::char_traitschar,std::allocatorchar filename=textures/TxMosaic.dds) Line 950 + 0x44 bytesC++ osgdb_ived.dll!ive::DataInputStream::readImage(ive::IncludeImageMode mode=IMAGE_REFERENCE_FILE) Line 1003 + 0x2c bytes C++ osgdb_ived.dll!ive::DataInputStream::readImage() Line 976 + 0xc bytes C++ osgdb_ived.dll!ive::Texture2D::read(ive::DataInputStream * in=0x00dbec2c) Line 54 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readStateAttribute() Line 1159 + 0xc bytesC++ osgdb_ived.dll!ive::StateSet::read(ive::DataInputStream * in=0x00dbec2c) Line 173 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readStateSet() Line 1066 C++ osgdb_ived.dll!ive::Node::read(ive::DataInputStream * in=0x00dbec2c) Line 124 + 0x8 bytes C++ osgdb_ived.dll!ive::Geode::read(ive::DataInputStream * in=0x00dbec2c) Line 67 C++ osgdb_ived.dll!ive::DataInputStream::readNode() Line 1507 + 0x2d bytes C++ osgdb_ived.dll!ive::Group::read(ive::DataInputStream * in=0x00dbec2c) Line 74 + 0x8 bytes C++ osgdb_ived.dll!ive::DataInputStream::readNode() Line 1499 + 0xc bytes C++ osgdb_ived.dll!ReaderWriterIVE::readNode(std::basic_istreamchar,std::char_traitschar fin={...}, const osgDB::ReaderWriter::Options * options=0x01f0ceb8) Line 98 + 0xb bytesC++ osgdb_ived.dll!ReaderWriterIVE::readNode(const std::basic_stringchar,std::char_traitschar,std::allocatorchar file=veluwe.ive, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 69 + 0x23 bytes C++ osg48-osgDBd.dll!osgDB::Registry::ReadNodeFunctor::doRead(osgDB::ReaderWriter rw={...}) Line 1374 + 0x40 bytes C++ osg48-osgDBd.dll!osgDB::Registry::read(const osgDB::Registry::ReadFunctor readFunctor={...}) Line 1527 + 0x22 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readImplementation(const osgDB::Registry::ReadFunctor readFunctor={...}, bool useObjectCache=true) Line 1596 + 0x13 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readNodeImplementation(const std::basic_stringchar,std::char_traitschar,std::allocatorchar fileName=veluwe.ive, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 1814 + 0x57 bytes C++ osg48-osgDBd.dll!osgDB::Registry::readNode(const std::basic_stringchar,std::char_traitschar,std::allocatorchar fileName=veluwe.ive, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 232 + 0x98 bytes C++ osg48-osgDBd.dll!osgDB::readNodeFile(const std::basic_stringchar,std::char_traitschar,std::allocatorchar filename=veluwe.ive, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 69 + 0x1d bytes C++ osg48-osgDBd.dll!osgDB::readNodeFiles(osg::ArgumentParser arguments={...}, const osgDB::ReaderWriter::Options * options=0x01efeb88) Line 224 + 0x43 bytesC++ osgviewerd.exe!osgDB::readNodeFiles(osg::ArgumentParser parser={...}) Line 132 + 0x32 bytes C++ osgviewerd.exe!main(int argc=2, char * * argv=0x01eee120) Line 134 + 0x9 bytes C++ osgviewerd.exe!__tmainCRTStartup() Line 597 + 0x19 bytes C osgviewerd.exe!mainCRTStartup() Line 414 C kernel32.dll!7c816d4f() [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll] kernel32.dll!7c8399f3() -Original Message- From: [EMAIL PROTECTED] on behalf of Robert Osfield Sent: Tue 18/11/2008 17:35 To: OpenSceneGraph Users Subject: Re: [osg-users] Multi-threading bug related to Registry Hi Rick, The solution is for ref_ptrImage to be used, but where I can't say as you don't provide full details on what code is at fault. Which plugins
Re: [osg-users] Multi-threading bug related to Registry
134 + 0x9 bytes C++ osgviewerd.exe!__tmainCRTStartup() Line 597 + 0x19 bytes C osgviewerd.exe!mainCRTStartup() Line 414 C kernel32.dll!7c816d4f() [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll] kernel32.dll!7c8399f3() -Original Message- From: [EMAIL PROTECTED] on behalf of Robert Osfield Sent: Tue 18/11/2008 17:35 To: OpenSceneGraph Users Subject: Re: [osg-users] Multi-threading bug related to Registry Hi Rick, The solution is for ref_ptrImage to be used, but where I can't say as you don't provide full details on what code is at fault. Which plugins are you invoking in this instance? .osg? .ive? Others? Robert. On Mon, Nov 17, 2008 at 1:23 PM, Rick Appleton [EMAIL PROTECTED] wrote: There is a nasty multi-threading bug related to osgDB::DatabasePager and the _objectCache in osgDB::Registry. This is the situation. * Thread 2 stack (DatabasePager thread): ReaderWriter::ReadResult Registry::readImplementation ReaderWriter::ReadResult Registry::readImageImplementation ReaderWriter::ReadResult Registry::readImage osg::Image* osgDB::readImageFile osg::Image* DataInputStream::readImage(std::string filename) osg::Image* DataInputStream::readImage(IncludeImageMode mode) osg::Image* DataInputStream::readImage() void Texture2D::read(DataInputStream* in) .. - After reading in the file in Registry::readImplementation the refCount of the new Image is 1 (ReadResult contains a ref_ptr). - This Image is then inserted into Registry::_objectCache with the default timestamp (0.0), which increments the recCount to 2. - The ReadResult object is returned, so no net change in refCount, until the stack is unwound up to osgDB::readImageFile. - osgDB::readImageFile takes a normal pointer to the Image, which does not change the refCount. When osgDB::readImageFile returns, the ReadResult object goes out of scope, deleting a reference to the Image. refCount is now 1. - Stack unwinds until back in Texture2D::read at which point the normal Image* is passed into Texture2D::setImage which assigns the Image* to the internal ref_ptr. At this moment the refCount is increased back to 2. Imagine that somewhere during the stack unwind between osgDB::readImageFile and Texture2D::read a thread switch takes place. * Thread 1: The main thread then reaches DatabasePager::removeExpiredSubgraphs which does the following at the end of the function. - Calls Registry::updateTimeStampOfObjectsInCacheWithExternalReferences, which goes through the objects and increments the timestamp if the refCount1 (this is not true for just loaded Image, so timestamp remains 0.0). - Calls Registry::removeExpiredObjectsInCache, which goes through all the objects and compares the timestamp with the expiry time. Because the Image was added with timestamp 0.0 (and the timestamp was not updated), it is added to objectsToRemove. - Registry::removeExpiredObjectsInCache then removes all the objects in objectsToRemove from Registry::_objectCache. This causes the refCount on the Image to decrement to 0, causing the actual Image object to be freed. At this point Thread 2 has an Image* to freed memory, but does not realize the object has been freed. In our case this eventually led to crashes. There are two solutions to this: - Instead of using the default timestamp of 0.0, set the time explicitly to the current time. This makes sure the item is not immediately discarded from the Registry. Even if not all loads are finished before the new item times out, the item will have been acquired by some other place in the code, so the refCount will be larger than 1 when the timeout occurs. This requires the current time to be passed into readImplementation. A proper solution would be to pass in the timestamp of the DatabaseRequest into the read* functions. That requires modification of a lot of functions though. - Changing the return types of the sequence above so there aren't any normal osg::Image* passed around, but only ref_ptr. This makes sure that refCount never goes to during the loading process. This option also requires the modification of a large number of functions. We chose a slightly less proper solution following the first option. We've added a function to set the time in Registry once a frame, and call that just before DatabasePager::updateSceneGraph (which calls DatabasePager::removeExpiredSubgraphs). Please feel free to contact me for more info as needed, Rick ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users
Re: [osg-users] Multi-threading bug related to Registry
Hi Rick, it looks like Robert Osfield missed this post otherwise he already would have reacted to it. ;) -- Roland From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rick Appleton Sent: maandag 17 november 2008 14:24 To: osg-users@lists.openscenegraph.org Subject: [osg-users] Multi-threading bug related to Registry There is a nasty multi-threading bug related to osgDB::DatabasePager and the _objectCache in osgDB::Registry. This is the situation. * Thread 2 stack (DatabasePager thread): ReaderWriter::ReadResult Registry::readImplementation ReaderWriter::ReadResult Registry::readImageImplementation ReaderWriter::ReadResult Registry::readImage osg::Image* osgDB::readImageFile osg::Image* DataInputStream::readImage(std::string filename) osg::Image* DataInputStream::readImage(IncludeImageMode mode) osg::Image* DataInputStream::readImage() void Texture2D::read(DataInputStream* in) .. - After reading in the file in Registry::readImplementation the refCount of the new Image is 1 (ReadResult contains a ref_ptr). - This Image is then inserted into Registry::_objectCache with the default timestamp (0.0), which increments the recCount to 2. - The ReadResult object is returned, so no net change in refCount, until the stack is unwound up to osgDB::readImageFile. - osgDB::readImageFile takes a normal pointer to the Image, which does not change the refCount. When osgDB::readImageFile returns, the ReadResult object goes out of scope, deleting a reference to the Image. refCount is now 1. - Stack unwinds until back in Texture2D::read at which point the normal Image* is passed into Texture2D::setImage which assigns the Image* to the internal ref_ptr. At this moment the refCount is increased back to 2. Imagine that somewhere during the stack unwind between osgDB::readImageFile and Texture2D::read a thread switch takes place. * Thread 1: The main thread then reaches DatabasePager::removeExpiredSubgraphs which does the following at the end of the function. - Calls Registry::updateTimeStampOfObjectsInCacheWithExternalReferences, which goes through the objects and increments the timestamp if the refCount1 (this is not true for just loaded Image, so timestamp remains 0.0). - Calls Registry::removeExpiredObjectsInCache, which goes through all the objects and compares the timestamp with the expiry time. Because the Image was added with timestamp 0.0 (and the timestamp was not updated), it is added to objectsToRemove. - Registry::removeExpiredObjectsInCache then removes all the objects in objectsToRemove from Registry::_objectCache. This causes the refCount on the Image to decrement to 0, causing the actual Image object to be freed. At this point Thread 2 has an Image* to freed memory, but does not realize the object has been freed. In our case this eventually led to crashes. There are two solutions to this: - Instead of using the default timestamp of 0.0, set the time explicitly to the current time. This makes sure the item is not immediately discarded from the Registry. Even if not all loads are finished before the new item times out, the item will have been acquired by some other place in the code, so the refCount will be larger than 1 when the timeout occurs. This requires the current time to be passed into readImplementation. A proper solution would be to pass in the timestamp of the DatabaseRequest into the read* functions. That requires modification of a lot of functions though. - Changing the return types of the sequence above so there aren't any normal osg::Image* passed around, but only ref_ptr. This makes sure that refCount never goes to during the loading process. This option also requires the modification of a large number of functions. We chose a slightly less proper solution following the first option. We've added a function to set the time in Registry once a frame, and call that just before DatabasePager::updateSceneGraph (which calls DatabasePager::removeExpiredSubgraphs). Please feel free to contact me for more info as needed, Rick This e-mail and its contents are subject to the DISCLAIMER at http://www.tno.nl/disclaimer/email.html ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Multi-threading bug related to Registry
Hi Rick, The solution is for ref_ptrImage to be used, but where I can't say as you don't provide full details on what code is at fault. Which plugins are you invoking in this instance? .osg? .ive? Others? Robert. On Mon, Nov 17, 2008 at 1:23 PM, Rick Appleton [EMAIL PROTECTED] wrote: There is a nasty multi-threading bug related to osgDB::DatabasePager and the _objectCache in osgDB::Registry. This is the situation. * Thread 2 stack (DatabasePager thread): ReaderWriter::ReadResult Registry::readImplementation ReaderWriter::ReadResult Registry::readImageImplementation ReaderWriter::ReadResult Registry::readImage osg::Image* osgDB::readImageFile osg::Image* DataInputStream::readImage(std::string filename) osg::Image* DataInputStream::readImage(IncludeImageMode mode) osg::Image* DataInputStream::readImage() void Texture2D::read(DataInputStream* in) .. - After reading in the file in Registry::readImplementation the refCount of the new Image is 1 (ReadResult contains a ref_ptr). - This Image is then inserted into Registry::_objectCache with the default timestamp (0.0), which increments the recCount to 2. - The ReadResult object is returned, so no net change in refCount, until the stack is unwound up to osgDB::readImageFile. - osgDB::readImageFile takes a normal pointer to the Image, which does not change the refCount. When osgDB::readImageFile returns, the ReadResult object goes out of scope, deleting a reference to the Image. refCount is now 1. - Stack unwinds until back in Texture2D::read at which point the normal Image* is passed into Texture2D::setImage which assigns the Image* to the internal ref_ptr. At this moment the refCount is increased back to 2. Imagine that somewhere during the stack unwind between osgDB::readImageFile and Texture2D::read a thread switch takes place. * Thread 1: The main thread then reaches DatabasePager::removeExpiredSubgraphs which does the following at the end of the function. - Calls Registry::updateTimeStampOfObjectsInCacheWithExternalReferences, which goes through the objects and increments the timestamp if the refCount1 (this is not true for just loaded Image, so timestamp remains 0.0). - Calls Registry::removeExpiredObjectsInCache, which goes through all the objects and compares the timestamp with the expiry time. Because the Image was added with timestamp 0.0 (and the timestamp was not updated), it is added to objectsToRemove. - Registry::removeExpiredObjectsInCache then removes all the objects in objectsToRemove from Registry::_objectCache. This causes the refCount on the Image to decrement to 0, causing the actual Image object to be freed. At this point Thread 2 has an Image* to freed memory, but does not realize the object has been freed. In our case this eventually led to crashes. There are two solutions to this: - Instead of using the default timestamp of 0.0, set the time explicitly to the current time. This makes sure the item is not immediately discarded from the Registry. Even if not all loads are finished before the new item times out, the item will have been acquired by some other place in the code, so the refCount will be larger than 1 when the timeout occurs. This requires the current time to be passed into readImplementation. A proper solution would be to pass in the timestamp of the DatabaseRequest into the read* functions. That requires modification of a lot of functions though. - Changing the return types of the sequence above so there aren't any normal osg::Image* passed around, but only ref_ptr. This makes sure that refCount never goes to during the loading process. This option also requires the modification of a large number of functions. We chose a slightly less proper solution following the first option. We've added a function to set the time in Registry once a frame, and call that just before DatabasePager::updateSceneGraph (which calls DatabasePager::removeExpiredSubgraphs). Please feel free to contact me for more info as needed, Rick ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] Multi-threading bug related to Registry
There is a nasty multi-threading bug related to osgDB::DatabasePager and the _objectCache in osgDB::Registry. This is the situation. * Thread 2 stack (DatabasePager thread): ReaderWriter::ReadResult Registry::readImplementation ReaderWriter::ReadResult Registry::readImageImplementation ReaderWriter::ReadResult Registry::readImage osg::Image* osgDB::readImageFile osg::Image* DataInputStream::readImage(std::string filename) osg::Image* DataInputStream::readImage(IncludeImageMode mode) osg::Image* DataInputStream::readImage() void Texture2D::read(DataInputStream* in) .. - After reading in the file in Registry::readImplementation the refCount of the new Image is 1 (ReadResult contains a ref_ptr). - This Image is then inserted into Registry::_objectCache with the default timestamp (0.0), which increments the recCount to 2. - The ReadResult object is returned, so no net change in refCount, until the stack is unwound up to osgDB::readImageFile. - osgDB::readImageFile takes a normal pointer to the Image, which does not change the refCount. When osgDB::readImageFile returns, the ReadResult object goes out of scope, deleting a reference to the Image. refCount is now 1. - Stack unwinds until back in Texture2D::read at which point the normal Image* is passed into Texture2D::setImage which assigns the Image* to the internal ref_ptr. At this moment the refCount is increased back to 2. Imagine that somewhere during the stack unwind between osgDB::readImageFile and Texture2D::read a thread switch takes place. * Thread 1: The main thread then reaches DatabasePager::removeExpiredSubgraphs which does the following at the end of the function. - Calls Registry::updateTimeStampOfObjectsInCacheWithExternalReferences, which goes through the objects and increments the timestamp if the refCount1 (this is not true for just loaded Image, so timestamp remains 0.0). - Calls Registry::removeExpiredObjectsInCache, which goes through all the objects and compares the timestamp with the expiry time. Because the Image was added with timestamp 0.0 (and the timestamp was not updated), it is added to objectsToRemove. - Registry::removeExpiredObjectsInCache then removes all the objects in objectsToRemove from Registry::_objectCache. This causes the refCount on the Image to decrement to 0, causing the actual Image object to be freed. At this point Thread 2 has an Image* to freed memory, but does not realize the object has been freed. In our case this eventually led to crashes. There are two solutions to this: - Instead of using the default timestamp of 0.0, set the time explicitly to the current time. This makes sure the item is not immediately discarded from the Registry. Even if not all loads are finished before the new item times out, the item will have been acquired by some other place in the code, so the refCount will be larger than 1 when the timeout occurs. This requires the current time to be passed into readImplementation. A proper solution would be to pass in the timestamp of the DatabaseRequest into the read* functions. That requires modification of a lot of functions though. - Changing the return types of the sequence above so there aren't any normal osg::Image* passed around, but only ref_ptr. This makes sure that refCount never goes to during the loading process. This option also requires the modification of a large number of functions. We chose a slightly less proper solution following the first option. We've added a function to set the time in Registry once a frame, and call that just before DatabasePager::updateSceneGraph (which calls DatabasePager::removeExpiredSubgraphs). Please feel free to contact me for more info as needed, Rick ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org