On Mon, Mar 23, 2009 at 8:27 PM, Carsten Neumann <[email protected]> wrote: > Hello Allen, > > Allen Bierbaum wrote: >> I think I have finally tracked down some of the issues with our >> application, but I wanted to write to ask if anyone can see any others >> and comment on one issue I think may need a fix/workaround in OpenSG. >> >> For purposes of this discussion, our application functions as follows: >> - There is a primordial thread P that takes care of all rendering from >> a single scene graph >> - There are 2 background threads (BG1, BG2) that load data from disk >> and periodically re-sync with the primordial thread to connect up the >> data they loaded. >> - All threads share the same aspect >> - No threads make changes to the same FC at the same time. (BG1 and >> BG2 create data, P connects that data to the scene graph) >> - References to FCs are held using MTRecPtr's > > I guess you know it, but I'll mention it anyways: in this scenario you > don't need the MT pointer as there is only one aspect. > >> 1) The first question is, what issues do the experts on the list see >> with this type of arrangement. Our simplistic thinking had been that >> as long as we don't write to the same FC at the same time in the same >> thread we would be ok. Now that we are into it further, we are >> worried about a couple of specific issues. >> >> * Change lists: As discussed in another posting, we think that the >> change list should be thread safe and that as long as we call >> commitChanges() we should be good. > > the change list is a per-thread data structure. I'm not quite sure what > you are referring to with "call commitChanges()..."; did you mean call > commitChanges() atomically?
What I meant is that you have to call commitChanges() in each thread before you can reliably use the newly created fc's in another thread. My understanding from the previous discussions is that this is required to get the pending changes to propagate and update everything. >> * Aspect store: Each FC has an aspect store. If two threads read >> from an FC in parallel, is it possible to corrupt the aspect store? >> (we have seen a few crashes on testing size of the aspect store of an >> fc) > > off hand I don't see how that could be a problem; your threads all run > on the same aspect, so there is no need to create new aspect copies and > store them in the AspectStore. In other words I would not expect any > writes to the AspectStore to happen once a container is created so it > should not become corrupted. Do you have more context when this happens? Not really. Just saw in twice today and it could be a side-effect of the other things we are talking about. >> * FCFactory: All new field containers get registered with the >> field container factory. Is the registration process thread safe? > > IIRC the factory uses locking to gain mutually exclusive access to the > internal storage, so yes, this should be safe. > >> * Other thread safety issues: Is there anywhere else we are not >> thinking off that may have some race conditions when sharing the same >> aspect across multiple threads? > > there are cases where the changed() member modifies other containers or > propagates to parent containers, so the set of touched containers may be > larger than those touched by the application (bounding volume > invalidation is the prime example here). Yep. We saw that one too. It is why we don't connect up nodes to the graph except in the primordial thread now. > >> 2) XxxPtr creation is not thread safe >> >> We think this one is killing us. To put it more concretely, imagine a >> scenario like this... >> >> ---- P ------ >> ChunkMaterialMTRecPtr mat = OSG::ChunkMaterial::create() >> .... >> SharedDataObject.setSharedMaterial(mat); >> >> ---- BG1 ---------- >> GeomMTRecPtr geom = ... >> ChunkMaterialMTRecPtr mat = SharedDataObject.getSharedMaterial(); >> geom->setMaterial(mat) >> >> ---- BG2 ---------- >> GeomMTRecPtr geom = ... >> ChunkMaterialMTRecPtr mat = SharedDataObject.getSharedMaterial(); >> geom->setMaterial(mat) >> >> There is a race condition here on the reference counting. More >> specifically, if multiple ChunkMaterialMTRecPtr's are created to the >> same FC at close to the same time, the reference count can get out of >> sync (we think). This can lead to dangling references and/or access >> to destroyed objects. > > yes, that is indeed a problem in your scenario; it was admittedly not > what we were thinking about when figuring out how the ref counting > should work (well, at least I did not ;) ). > >> This may seem like a no-brainer issue, just don't do it. But it gets >> more complex then that. For example this issue makes it so the simple >> geom utilities in OpenSG are not thread safe because they call >> getDefaultMaterial, which in turn creates RecPtrs that can get out of >> sync. >> More generally, the problem is that RefPtrs can't be treated the same >> as C++ primitive types. They can not be read from in parallel and >> stay consistent. > > AFAIUI you can dereference them in parallel, what breaks is the > concurrent creation of additional pointers to an object (as that > concurrently modifies the ref count). Or did you run into cases where > parallel dereference causes problems as well? You are correct. Unfortunately the ptrs are created constantly as a side-effect of passing them as arguments or returning them by value. So it effectively makes it very difficult to use them. Especially with the constant conversion that happens between MTRecPtr, RecPtr, TransitPtr, and the others I am missing. Now that we know what to look for, I *think* we can find all the little corner cases, it was just difficult to reason through and realize that this was our problem. I am very happy to know now, and it sounds like this is likely the only major problem we were running into. >> The thing that really tripped us up is that we are >> used to using the boost shared_ptr implementation and it does allow >> for this degreee of thread safety. They explain in much better then I >> can: >> http://www.boost.org/doc/libs/1_38_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety >> (copied here for easy access) > [SNIP - boost doc] >> Starting with Boost release 1.33.0, shared_ptr uses a lock-free >> implementation on the following platforms: >> >> * GNU GCC on x86 or x86-64; >> * GNU GCC on IA64; >> * Metrowerks CodeWarrior on PowerPC; >> * GNU GCC on PowerPC; >> * Windows. >> ------------------------------------------ >> >> As you can see, boost goes to great lengths to enable this type of >> use. What I am trying to determine is can OpenSG do the same thing. >> I really don't know, but IMHO it would be very very nice if it could. >> I think it would protect users from this type of nasty bug and it >> would also allow the Ptr types to be used in a way that is consistent >> with boost (and the future TR1). > > well, I can try and see if I at least can find their lock-free > implementation (not too much hope though, boost does not care about > accessibility of implementation details :( ), that should allow us to > understand if we can have something similar, right now I honestly don't > know either. I took a look earlier today and I think they made the implementation detail of the atomic ops and spin locks relatively accessible. Initially it even looked like they could be used directly without having to copy code out of boost. That of course was based on me not understanding at all how OpenSG would need to use it on it's ref counts, so I think it would be much better for you to look yourself. :) Thanks, Allen > > Cheers, > Carsten > > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > easily build your RIAs with Flex Builder, the Eclipse(TM)based development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com > _______________________________________________ > Opensg-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensg-users > ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ Opensg-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensg-users
