Hi David,

I've just started reviewing your submission.  The first things that
jumps out is that changing to a multiset makes the assumption that the
once a FileRequest is inserted it doesn't change it's own sort value.
This is not valid assumption as the design of database paging scheme
is that the priority of each request can change from frame to frame as
it come more into, or goes out further out range - making it more or
less important to load.   The change you've proposed would break this
property so isn't acceptable.

The race condition and implementation of locks/unlocks I haven't yet
got my head around so I will continue looking at this.  One quick
feedback I can provide is that it's not generally safe programming to
lock() and then unlock() mutexes in conditional blocks. It would be
very very easy for a minor code change to end up with locking and
never unlocking or attempting to unlock a mutex that hasn't been
locked.  The ScopedLock is used throughout the OSG code base to
prevent these mistakes from creeping into the code and is also
exception safe.  I have little doubt that the better way to protect
things would be via a ScopedLock, even if these means restructuring
code to enable this.

One really surprising aspect to your post is that you discuss 10,000
to 50,000 DatabaseRequests being generated in your application. This
is several orders of magnitude times greater than I'd ever expect for
an application.  Something has to be seriously wrong in the database
or the app/pager for this to happen.  This really does warrant further
investigation/explanation, the DatabasePager has never been designed
to handle this level of requests, and neither do I expect it ever to
be used with this level of requests, the hardware, OS and software
will never been able to handle tens of thousands of file requests per
second.

How are you able to generate this number of requests?  What type of
database are you working with?  What type of application?

Robert.

On Wed, Jan 27, 2010 at 11:43 PM, David Fries <[email protected]> wrote:
> These files are based on the subversion top as of yesterday, version
> 11008.
>
>
>
> osgDB::DatabasePager in requestNodeFile is sometimes modifying the
> request's data without holding the correct lock.  In every other
> section of DatabasePager, the request queue is known before hand and
> can be locked before accessing a request in that queue.  In this case
> the databaseRequestRef passes a pointer, which could be in any queue,
> and the queue pointer is read from the request.  The problem is the
> DatabasePager is reading the queue pointer from the request without
> holding the lock, and between when it reads the pointer and locks the
> pointer another thread can modify the queue by moving it to another
> queue.  This catch 22 problem is resolved by checking that the queue
> locked is still the same as the pointer in the request.  If it is, we
> won and no other thread can change the queue.  If it isn't, we lost,
> unlock the queue we did lock and try again.  There are only four
> queues it can move between, so there is a very small upper bound on
> how many iterations it will try before succeeding.
>
> In practice the race condition between another thread moving the
> request between the pointer read and mutex lock was hitting in a few
> seconds to a couple of minutes.  This is because compileGLObjects is
> retrieving the first request off the to compile queue then moving it
> to the to merge queue, while that request is probably still needed and
> being re-requested from the scene graph, leading this race condition
> to come up much more frequently than would be expected.
>
>
>
> When using std::vector for RequestList, the DatabasePager sorts the
> list to determine the highest priority request to load.  That
> operation can be very expensive if there are lots of entries, because
> it is sorted for each item extracted.  This also blocks the drawing
> thread because when lock is held for a long period of time the drawing
> thread must wait for the sort to complete before being able to add to
> the request queue.  Instead of frequently sorting the vector, use
> std::multiset, which will keep the list in order.
>
> Here are some numerical results of how long it takes to clear the
> request queue after it has reached 10,000 or 50,000 requests and the
> viewer movement is stopped in the terrapage terrain we have.
> std::vector 10,000 requests, 20 seconds to clear; 50,000 11 minutes to clear
> std::multiset 10,000 requests, less than a second, 50,000 about a second
> Note: requires StatsHandler in osgviewer to see the request queue.
> viewer.addEventHandler(new osgViewer::StatsHandler);
>
> The whole earth blue marble can be used, zoom into the highest level
> of detail and give it a good spin.  If it isn't getting behind load up
> the computer with other tasks.
> http://www.openscenegraph.org/data/earth_bayarea/earth.ive
>
> --
> David Fries <[email protected]>
> http://fries.net/~david/ (PGP encryption key available)
>
> _______________________________________________
> osg-submissions mailing list
> [email protected]
> http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org
>
>
_______________________________________________
osg-submissions mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org

Reply via email to