Re: [Opensg-users] Ptrs, threading, and crashes

Gerrit Voß Mon, 23 Mar 2009 21:25:41 -0700

Hi,

On Mon, 2009-03-23 at 21:43 -0600, Allen Bierbaum wrote:
> On Mon, Mar 23, 2009 at 8:27 PM, Carsten Neumann
> <[email protected]> wrote:
> >        Hello Allen,
> >
> > Allen Bierbaum wrote:


> >> 1) The first question is, what issues do the experts on the list see
> >> with this type of arrangement.  Our simplistic thinking had been that
> >> as long as we don't write to the same FC at the same time in the same
> >> thread we would be ok.  Now that we are into it further, we are
> >> worried about a couple of specific issues.
> >>
> >>     * Change lists: As discussed in another posting, we think that the
> >> change list should be thread safe and that as long as we call
> >> commitChanges() we should be good.
> >
> > the change list is a per-thread data structure. I'm not quite sure what
> > you are referring to with "call commitChanges()..."; did you mean call
> > commitChanges() atomically?
> 
> What I meant is that you have to call commitChanges() in each thread
> before you can reliably use the newly created fc's in another thread.
> My understanding from the previous discussions is that this is
> required to get the pending changes to propagate and update
> everything.
> 
> >>     * Aspect store: Each FC has an aspect store.  If two threads read
> >> from an FC in parallel, is it possible to corrupt the aspect store?
> >> (we have seen a few crashes on testing size of the aspect store of an
> >> fc)
> >
> > off hand I don't see how that could be a problem; your threads all run
> > on the same aspect, so there is no need to create new aspect copies and
> > store them in the AspectStore. In other words I would not expect any
> > writes to the AspectStore to happen once a container is created so it
> > should not become corrupted. Do you have more context when this happens?
> 
> Not really.  Just saw in twice today and it could be a side-effect of
> the other things we are talking about.

hmm, I would not expect problems with the aspect store as they are
only written on two occasions, during the initial creation and
during sync. There should be no interference during both times
as nobody can access it during the initial creation and while
syncing everything must be stopped and waiting for the sync
to happen anyway. Meddling with containers while they
are synchronized is not safe.

> >
> >> 2) XxxPtr creation is not thread safe
> >>
> >> We think this one is killing us.  To put it more concretely, imagine a
> >> scenario like this...
> >>
> >> ---- P ------
> >> ChunkMaterialMTRecPtr mat = OSG::ChunkMaterial::create()
> >> ....
> >> SharedDataObject.setSharedMaterial(mat);
> >>
> >> ---- BG1 ----------
> >> GeomMTRecPtr geom = ...
> >> ChunkMaterialMTRecPtr mat = SharedDataObject.getSharedMaterial();
> >> geom->setMaterial(mat)
> >>
> >> ---- BG2 ----------
> >> GeomMTRecPtr geom = ...
> >> ChunkMaterialMTRecPtr mat = SharedDataObject.getSharedMaterial();
> >> geom->setMaterial(mat)
> >>
> >> There is a race condition here on the reference counting.  More
> >> specifically, if multiple ChunkMaterialMTRecPtr's are created to the
> >> same FC at close to the same time, the reference count can get out of
> >> sync (we think).  This can lead to dangling references and/or access
> >> to destroyed objects.
> >
> > yes, that is indeed a problem in your scenario; it was admittedly not
> > what we were thinking about when figuring out how the ref counting
> > should work (well, at least I did not ;) ).
> >
> >> This may seem like a no-brainer issue, just don't do it.  But it gets
> >> more complex then that.  For example this issue makes it so the simple
> >> geom utilities in OpenSG are not thread safe because they call
> >> getDefaultMaterial, which in turn creates RecPtrs that can get out of
> >> sync.
> >> More generally, the problem is that RefPtrs can't be treated the same
> >> as C++ primitive types.  They can not be read from in parallel and
> >> stay consistent.
> >
> > AFAIUI you can dereference them in parallel, what breaks is the
> > concurrent creation of additional pointers to an object (as that
> > concurrently modifies the ref count). Or did you run into cases where
> > parallel dereference causes problems as well?
> 
> You are correct.  Unfortunately the ptrs are created constantly as a
> side-effect of passing them as arguments or returning them by value.
> So it effectively makes it very difficult to use them.  Especially
> with the constant conversion that happens between MTRecPtr, RecPtr,
> TransitPtr, and the others I am missing.
> 
> Now that we know what to look for, I *think* we can find all the
> little corner cases, it was just difficult to reason through and
> realize that this was our problem.  I am very happy to know now, and
> it sounds like this is likely the only major problem we were running
> into.
> 
> >>  The thing that really tripped us up is that we are
> >> used to using the boost shared_ptr implementation and it does allow
> >> for this degreee of thread safety.  They explain in much better then I
> >> can: 
> >> http://www.boost.org/doc/libs/1_38_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety
> >> (copied here for easy access)
> > [SNIP - boost doc]
> >> Starting with Boost release 1.33.0, shared_ptr uses a lock-free
> >> implementation on the following platforms:
> >>
> >>     * GNU GCC on x86 or x86-64;
> >>     * GNU GCC on IA64;
> >>     * Metrowerks CodeWarrior on PowerPC;
> >>     * GNU GCC on PowerPC;
> >>     * Windows.
> >> ------------------------------------------
> >>
> >> As you can see, boost goes to great lengths to enable this type of
> >> use.  What I am trying to determine is can OpenSG do the same thing.
> >> I really don't know, but IMHO it would be very very nice if it could.
> >> I think it would protect users from this type of nasty bug and it
> >> would also allow the Ptr types to be used in a way that is consistent
> >> with boost (and the future TR1).
> >
> > well, I can try and see if I at least can find their lock-free
> > implementation (not too much hope though, boost does not care about
> > accessibility of implementation details :( ), that should allow us to
> > understand if we can have something similar, right now I honestly don't
> > know either.
> 
> I took a look earlier today and I think they made the implementation
> detail of the atomic ops and spin locks relatively accessible.
> Initially it even looked like they could be used directly without
> having to copy code out of boost.  That of course was based on me not
> understanding at all how OpenSG would need to use it on it's ref
> counts, so I think it would be much better for you to look yourself.
> :)
> 

relatively, the problem is that the only consistent thing they export is
the blob (sp_counted_base), not the atomic operations we need, so we
might take some inspiration from there but I guess we have to do the
two required functions (atomicInc, atomicExAndAdd) by ourselves. But
we can fallback to boost where we can.

kind regards,
  gerrit




------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Opensg-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensg-users

Re: [Opensg-users] Ptrs, threading, and crashes

Reply via email to