Re: [osg-users] Question about views, contexts and threading

2008-11-24 Thread Ferdi Smit

Robert,

I tried your suggestion, but it didn't have any effect. It's probably a 
driver issue then (nvidia 180.06 beta). I should receive a dual GTX260 
system any day now; I'll try and see if that works better.


Robert Osfield wrote:

HI Ferdi,

Could try the same tests but with the following env var  set:

  set OSG_SERIALIZE_DRAW_DISPATCH=OFF

This will disable the mutex that serializes the draw dispatch.  Have a
search through the archives on this topic as I've written lots about
this topic and the fact serialize draw curious improves performance on
systems that I've tested on.  I still haven't had feedback from the
community on this topic as it's likely to be something effected by
hardware/drivers and OS.

Robert.

On Thu, Nov 20, 2008 at 4:05 PM, Ferdi Smit [EMAIL PROTECTED] wrote:
  

Thank you, that at least explains some of the drawing times I've been
seeing.

I ran more tests on our dual-gpu system, summarized below. Not striclty OSG
related, but they may be interesting nonetheless...

- Scene of 25x 1 million polygon model, all visible. Culling etc neglibile.
- Stand-alone refers to one rendering context only; normal, non-parallel
rendering
- frame rates in FPS

CPU Affinity on different cores
OSG_THREADING=SingleThreaded
(1 core shows heavy use, 2nd core show moderate use, 2 cores idle)

  Quadro 56008800GTX
Single-GPU / Stand-alone1615

Single-GPU / Multi-Threaded7.57.5
Single-GPU / Multi-Processing7.57.5

Multi-GPU / Multi-Threaded6.56.5
Multi-GPU / Multi-Processing1615

  Quadro 56008800GTX

OSG_THREADING=ThreadPerContext
(CPU Affinity is set but appears to be ignored: 1 core shows heavy use,
others idle)

Single-GPU / Stand-alone1615

Single-GPU / Multi-Threaded7.57.5
Single-GPU / Multi-Processing7.57.5

Multi-GPU / Multi-Threaded3.511
Multi-GPU / Multi-Processing1114


  Quadro 56008800GTX
Baseline:
Multi-GPU / Multi-Threaded6.56.5

Speeding up one card by rendering empty scene*, effect on other card:
Multi-GPU / Multi-Threaded6000*15
Multi-GPU / Multi-Threaded714*


All results are reasonable, except:

Single-GPU / Multi-Processing7.57.5
Multi-GPU / Multi-Threaded6.56.5
Multi-GPU / Multi-Processing1615

Which is very strange; using two distinct GPUs simultaneously in a threaded
way in the same address space is slower than sharing a single GPU. I can
only conclude that OpenGL drivers can not handle multi-threading with
different contexts on different devices. It also seems that the Quadro is
the culprit, locking the driver or something. If you let the quadro render
fast, the 8800 also renders fast. However, if you allow the 8800 to render
fast, both will remain slow.




--
Regards,

Ferdi Smit
INS3 Visualization and 3D Interfaces
CWI Amsterdam, The Netherlands

___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org



___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
  



--
Regards,

Ferdi Smit
INS3 Visualization and 3D Interfaces
CWI Amsterdam, The Netherlands

___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] Question about views, contexts and threading

2008-11-20 Thread Ferdi Smit
Thank you, that at least explains some of the drawing times I've been 
seeing.


I ran more tests on our dual-gpu system, summarized below. Not striclty 
OSG related, but they may be interesting nonetheless...


- Scene of 25x 1 million polygon model, all visible. Culling etc neglibile.
- Stand-alone refers to one rendering context only; normal, non-parallel 
rendering

- frame rates in FPS

CPU Affinity on different cores
OSG_THREADING=SingleThreaded
(1 core shows heavy use, 2nd core show moderate use, 2 cores idle)

   Quadro 56008800GTX
Single-GPU / Stand-alone1615

Single-GPU / Multi-Threaded7.57.5
Single-GPU / Multi-Processing7.57.5

Multi-GPU / Multi-Threaded6.56.5
Multi-GPU / Multi-Processing1615

   Quadro 56008800GTX

OSG_THREADING=ThreadPerContext
(CPU Affinity is set but appears to be ignored: 1 core shows heavy use, 
others idle)


Single-GPU / Stand-alone1615

Single-GPU / Multi-Threaded7.57.5
Single-GPU / Multi-Processing7.57.5

Multi-GPU / Multi-Threaded3.511
Multi-GPU / Multi-Processing1114


   Quadro 56008800GTX
Baseline:
Multi-GPU / Multi-Threaded6.56.5

Speeding up one card by rendering empty scene*, effect on other card:
Multi-GPU / Multi-Threaded6000*15
Multi-GPU / Multi-Threaded714*


All results are reasonable, except:

Single-GPU / Multi-Processing7.57.5
Multi-GPU / Multi-Threaded6.56.5
Multi-GPU / Multi-Processing1615

Which is very strange; using two distinct GPUs simultaneously in a 
threaded way in the same address space is slower than sharing a single 
GPU. I can only conclude that OpenGL drivers can not handle 
multi-threading with different contexts on different devices. It also 
seems that the Quadro is the culprit, locking the driver or something. 
If you let the quadro render fast, the 8800 also renders fast. However, 
if you allow the 8800 to render fast, both will remain slow.


Robert Osfield wrote:

Hi Ferdi,

The understand what is happening with draw in the two instances you
need to understand how OpenGL operates.  For each graphics context
OpenGL maintains a FIFO that is filled by the applications graphics
thread for that context, and is drained by the driver that batches the
commands/data in the fifo up into a form that can be pushed to the
graphics card.

Now if this FIFO has plenty of room then the application can keep
filling the FIFO without OpenGL ever blocking the applications graphis
thread - in this case the draw dispatch times (the OSG side) are
relatively low.  If however you fill the FIFO then OpenGL will block
the applications graphics thread till enough room has been made by the
GPU consuming command/data at the other end.   When you get to this
point often you'll find draw dispatch times that suddenly jump up, and
it's not because it's suddenly doing more work - in fact the app
graphics thread is just sitting their idle waiting for the graphics
drive/GPU to do it's stuff.

Now drivers may have different sized FIFO's, and different GPU's will
work at different speeds and possibly have other features that affect
the FIFO filling/emptying. One would expect slower GPU's to empty the
fifo slower so are more likely to block, but the driver can also have
affect.  The architecture of overall hardware, what other threads are
running, how contended the various parts of the hardware etc all can
have an effect.   The fact that one GPU's draw dispatch is far longer
than another might simply bit that it's pushed just hard enough to
fill the FIFO, but it might still hit frame just fine, but the draw
times will be drastically higher because of the blocking due to the
filled FIFO, a slightly lower load will lead could lead to FIFO not
blocking an huge drop in draw dispatch times.  It's very no linear,
small differences can result in large observed differences, but often
the long draw time might not be anything to worry about - it's just an
early warning sign, you might still hit your target frame rate just
fine.

Robert.

Robert.

On Tue, Nov 18, 2008 at 3:31 PM, Ferdi Smit [EMAIL PROTECTED] wrote:
  

Hi Robert,

I ran some more tests with a realistic scene of ~25M polygons (25 times the
same 1M model). Stand-alone this is rendered at ~15 FPS on one GPU (8800GTX
or Quadro FX5600 + Intel Quad Core). Multi-_processing_ with two contexts at
two gpus, both rendering this scene, the 8800 stays at 15 but the Quadro
drops to 12. Multi-_threading_ with two contexts at two gpus, the 8800 drops
to 9.5 and the quadro to 4.5 FPS. This is weird. Also, the 8800 reports (in
the osg performance hud) that GPU=65 and Draw=10. Draw is always much lower
than GPU. But the Quadro in multi-threading goes to GPU=210 and Draw=210;
GPU and Draw are suddenly equal now. What does 

Re: [osg-users] Question about views, contexts and threading

2008-11-20 Thread Robert Osfield
HI Ferdi,

Could try the same tests but with the following env var  set:

  set OSG_SERIALIZE_DRAW_DISPATCH=OFF

This will disable the mutex that serializes the draw dispatch.  Have a
search through the archives on this topic as I've written lots about
this topic and the fact serialize draw curious improves performance on
systems that I've tested on.  I still haven't had feedback from the
community on this topic as it's likely to be something effected by
hardware/drivers and OS.

Robert.

On Thu, Nov 20, 2008 at 4:05 PM, Ferdi Smit [EMAIL PROTECTED] wrote:
 Thank you, that at least explains some of the drawing times I've been
 seeing.

 I ran more tests on our dual-gpu system, summarized below. Not striclty OSG
 related, but they may be interesting nonetheless...

 - Scene of 25x 1 million polygon model, all visible. Culling etc neglibile.
 - Stand-alone refers to one rendering context only; normal, non-parallel
 rendering
 - frame rates in FPS

 CPU Affinity on different cores
 OSG_THREADING=SingleThreaded
 (1 core shows heavy use, 2nd core show moderate use, 2 cores idle)

   Quadro 56008800GTX
 Single-GPU / Stand-alone1615

 Single-GPU / Multi-Threaded7.57.5
 Single-GPU / Multi-Processing7.57.5

 Multi-GPU / Multi-Threaded6.56.5
 Multi-GPU / Multi-Processing1615

   Quadro 56008800GTX

 OSG_THREADING=ThreadPerContext
 (CPU Affinity is set but appears to be ignored: 1 core shows heavy use,
 others idle)

 Single-GPU / Stand-alone1615

 Single-GPU / Multi-Threaded7.57.5
 Single-GPU / Multi-Processing7.57.5

 Multi-GPU / Multi-Threaded3.511
 Multi-GPU / Multi-Processing1114


   Quadro 56008800GTX
 Baseline:
 Multi-GPU / Multi-Threaded6.56.5

 Speeding up one card by rendering empty scene*, effect on other card:
 Multi-GPU / Multi-Threaded6000*15
 Multi-GPU / Multi-Threaded714*


 All results are reasonable, except:

 Single-GPU / Multi-Processing7.57.5
 Multi-GPU / Multi-Threaded6.56.5
 Multi-GPU / Multi-Processing1615

 Which is very strange; using two distinct GPUs simultaneously in a threaded
 way in the same address space is slower than sharing a single GPU. I can
 only conclude that OpenGL drivers can not handle multi-threading with
 different contexts on different devices. It also seems that the Quadro is
 the culprit, locking the driver or something. If you let the quadro render
 fast, the 8800 also renders fast. However, if you allow the 8800 to render
 fast, both will remain slow.

 Robert Osfield wrote:

 Hi Ferdi,

 The understand what is happening with draw in the two instances you
 need to understand how OpenGL operates.  For each graphics context
 OpenGL maintains a FIFO that is filled by the applications graphics
 thread for that context, and is drained by the driver that batches the
 commands/data in the fifo up into a form that can be pushed to the
 graphics card.

 Now if this FIFO has plenty of room then the application can keep
 filling the FIFO without OpenGL ever blocking the applications graphis
 thread - in this case the draw dispatch times (the OSG side) are
 relatively low.  If however you fill the FIFO then OpenGL will block
 the applications graphics thread till enough room has been made by the
 GPU consuming command/data at the other end.   When you get to this
 point often you'll find draw dispatch times that suddenly jump up, and
 it's not because it's suddenly doing more work - in fact the app
 graphics thread is just sitting their idle waiting for the graphics
 drive/GPU to do it's stuff.

 Now drivers may have different sized FIFO's, and different GPU's will
 work at different speeds and possibly have other features that affect
 the FIFO filling/emptying. One would expect slower GPU's to empty the
 fifo slower so are more likely to block, but the driver can also have
 affect.  The architecture of overall hardware, what other threads are
 running, how contended the various parts of the hardware etc all can
 have an effect.   The fact that one GPU's draw dispatch is far longer
 than another might simply bit that it's pushed just hard enough to
 fill the FIFO, but it might still hit frame just fine, but the draw
 times will be drastically higher because of the blocking due to the
 filled FIFO, a slightly lower load will lead could lead to FIFO not
 blocking an huge drop in draw dispatch times.  It's very no linear,
 small differences can result in large observed differences, but often
 the long draw time might not be anything to worry about - it's just an
 early warning sign, you might still hit your target frame rate just
 fine.

 Robert.

 Robert.

 On Tue, Nov 18, 2008 at 3:31 PM, Ferdi Smit [EMAIL PROTECTED] wrote:


 Hi Robert,

 I ran some more tests with a 

Re: [osg-users] Question about views, contexts and threading

2008-11-18 Thread Ferdi Smit

Hi Robert,

I ran some more tests with a realistic scene of ~25M polygons (25 times 
the same 1M model). Stand-alone this is rendered at ~15 FPS on one GPU 
(8800GTX or Quadro FX5600 + Intel Quad Core). Multi-_processing_ with 
two contexts at two gpus, both rendering this scene, the 8800 stays at 
15 but the Quadro drops to 12. Multi-_threading_ with two contexts at 
two gpus, the 8800 drops to 9.5 and the quadro to 4.5 FPS. This is 
weird. Also, the 8800 reports (in the osg performance hud) that GPU=65 
and Draw=10. Draw is always much lower than GPU. But the Quadro in 
multi-threading goes to GPU=210 and Draw=210; GPU and Draw are suddenly 
equal now. What does this Draw statistic represent? Is it time spend in 
driver draw calls?


I suspect buggy Quadro drivers, but I'm not sure. It's the only system I 
can test on. I'm sorry if this diverts from a pure OSG discussion; 
perhaps I should take it to an nvidia forum.


Robert Osfield wrote:

Hi Ferdi,

W.r.t performance and stability of multi-threading the graphics, as
long as you have two GPU's the most efficient way to drive them should
be multi-threaded - there is a caveat though, hardware and drivers
aren't always up to scratch, and even then they should be able to
manage the multi-threads and multi-gpus seemless they fail too.

I'm poised to build a new machine based on the new Intel iCore7 and
X58 motherboard, it'll be interesting to see how well is scales.

W.r.t PBO readback - it's very very sensitive to the pixel formats you
use.  See the osgscreencapture example.

Robert.

On Mon, Nov 17, 2008 at 5:31 PM, Ferdi Smit [EMAIL PROTECTED] wrote:
  

Thanks Robert. I did a quick test with two viewers from two threads and it
appears to be working. Btw, from my experience, PBO doesn't seem to be any
faster (and on some hardware much slower) for downloading textures to host
than glReadPixels, while for uploads it is almost consistently faster.
Anyway, that should not be a problem, even to code it manually.

One question about the OpenGL driver, are you by any chance aware of any
threading issues? Is it completely re-entrant from two different contexts
and threads? With this two-thread setup, I see some occasional erratic
fluctuation in drawing time in the osg performance hud for a completely
still scene. The GPU performance is very stable, regardless of the load on
the other card, but the drawing time (software) sometimes goes from
something like 0.4 to 2.6 or 1.5 for a couple of frames. I do not notice
this, or not as much, when using two separate processes instead of two
threads. The only difference I can think of here is that the OpenGL driver
part is in the same address space and maybe internally locks occasionally?
Or is this nonsense?

Anyway, the osg part seems to be fairly straightforward and simple like
this. Thanks.


Robert Osfield wrote:


Hi Ferdi,

osgViewer::CompositeViewer runs all of the views synchronously - one
frame() call dispatches update, event, cull and draw traversals for
all the views.  So for you case where you want them to run async, this
isn't supported.  Supporting within CompositeViewer would really
complicate the API so it's not something I gone for.

What you will be able to do is use two separate Viewer's.  You are
likely to want to run two threads for each of the viewers frame loops
as well.  To get the render to image result to the second viewer all
you need to do is assign the same osg::Image to the first viewer's
Camera for it to copy to, and then attach the same osg::Image to a
texture in the scene of the second viewer.  The OSG should
automatically do the glReadPixels to the image data, dirty the Image,
and then automatically the texture will update in the second viewer.
You could potentially optimize things by using an PBO but the off the
shelf osg::PixelBufferObject isn't suitable for read in this way so
you'll need to roll you own support for this.

It's worth noting that I've never written a app like the above, so you
are rather working on the bleeding edge.  I think it should work, or
at least I can't spot any major problems that might appear.

Robert.

On Mon, Nov 17, 2008 at 9:37 AM, Ferdi Smit [EMAIL PROTECTED] wrote:

  

I'm looking to do the following in OSG, and I wonder if I'm on the right
track (before wasting time needlessly): have two render processes run in
parallel on two different GPUs, have one render a scene to texture and
let
this texture be read by the other process and mapped to an object in a
different scene. Problem, the rendering of the first scene to texture is
very slow and the rendering of the second scene is very fast.

I intend to solve it in the following way in pseudo-code:

- new CompositeViewer
- Add two Views
- Construct two contexts, one on localhost:0.0, one on localhost:0.1
- Attach contexts to cameras of corresponding Views
- Set composite viewer threading mode to thread-per-context

--- First process
- Set view camera mode to FBO and pre-render
- Add post-draw 

Re: [osg-users] Question about views, contexts and threading

2008-11-18 Thread Robert Osfield
Hi Ferdi,

The understand what is happening with draw in the two instances you
need to understand how OpenGL operates.  For each graphics context
OpenGL maintains a FIFO that is filled by the applications graphics
thread for that context, and is drained by the driver that batches the
commands/data in the fifo up into a form that can be pushed to the
graphics card.

Now if this FIFO has plenty of room then the application can keep
filling the FIFO without OpenGL ever blocking the applications graphis
thread - in this case the draw dispatch times (the OSG side) are
relatively low.  If however you fill the FIFO then OpenGL will block
the applications graphics thread till enough room has been made by the
GPU consuming command/data at the other end.   When you get to this
point often you'll find draw dispatch times that suddenly jump up, and
it's not because it's suddenly doing more work - in fact the app
graphics thread is just sitting their idle waiting for the graphics
drive/GPU to do it's stuff.

Now drivers may have different sized FIFO's, and different GPU's will
work at different speeds and possibly have other features that affect
the FIFO filling/emptying. One would expect slower GPU's to empty the
fifo slower so are more likely to block, but the driver can also have
affect.  The architecture of overall hardware, what other threads are
running, how contended the various parts of the hardware etc all can
have an effect.   The fact that one GPU's draw dispatch is far longer
than another might simply bit that it's pushed just hard enough to
fill the FIFO, but it might still hit frame just fine, but the draw
times will be drastically higher because of the blocking due to the
filled FIFO, a slightly lower load will lead could lead to FIFO not
blocking an huge drop in draw dispatch times.  It's very no linear,
small differences can result in large observed differences, but often
the long draw time might not be anything to worry about - it's just an
early warning sign, you might still hit your target frame rate just
fine.

Robert.

Robert.

On Tue, Nov 18, 2008 at 3:31 PM, Ferdi Smit [EMAIL PROTECTED] wrote:
 Hi Robert,

 I ran some more tests with a realistic scene of ~25M polygons (25 times the
 same 1M model). Stand-alone this is rendered at ~15 FPS on one GPU (8800GTX
 or Quadro FX5600 + Intel Quad Core). Multi-_processing_ with two contexts at
 two gpus, both rendering this scene, the 8800 stays at 15 but the Quadro
 drops to 12. Multi-_threading_ with two contexts at two gpus, the 8800 drops
 to 9.5 and the quadro to 4.5 FPS. This is weird. Also, the 8800 reports (in
 the osg performance hud) that GPU=65 and Draw=10. Draw is always much lower
 than GPU. But the Quadro in multi-threading goes to GPU=210 and Draw=210;
 GPU and Draw are suddenly equal now. What does this Draw statistic
 represent? Is it time spend in driver draw calls?

 I suspect buggy Quadro drivers, but I'm not sure. It's the only system I can
 test on. I'm sorry if this diverts from a pure OSG discussion; perhaps I
 should take it to an nvidia forum.

 Robert Osfield wrote:

 Hi Ferdi,

 W.r.t performance and stability of multi-threading the graphics, as
 long as you have two GPU's the most efficient way to drive them should
 be multi-threaded - there is a caveat though, hardware and drivers
 aren't always up to scratch, and even then they should be able to
 manage the multi-threads and multi-gpus seemless they fail too.

 I'm poised to build a new machine based on the new Intel iCore7 and
 X58 motherboard, it'll be interesting to see how well is scales.

 W.r.t PBO readback - it's very very sensitive to the pixel formats you
 use.  See the osgscreencapture example.

 Robert.

 On Mon, Nov 17, 2008 at 5:31 PM, Ferdi Smit [EMAIL PROTECTED] wrote:


 Thanks Robert. I did a quick test with two viewers from two threads and
 it
 appears to be working. Btw, from my experience, PBO doesn't seem to be
 any
 faster (and on some hardware much slower) for downloading textures to
 host
 than glReadPixels, while for uploads it is almost consistently faster.
 Anyway, that should not be a problem, even to code it manually.

 One question about the OpenGL driver, are you by any chance aware of any
 threading issues? Is it completely re-entrant from two different contexts
 and threads? With this two-thread setup, I see some occasional erratic
 fluctuation in drawing time in the osg performance hud for a completely
 still scene. The GPU performance is very stable, regardless of the load
 on
 the other card, but the drawing time (software) sometimes goes from
 something like 0.4 to 2.6 or 1.5 for a couple of frames. I do not notice
 this, or not as much, when using two separate processes instead of two
 threads. The only difference I can think of here is that the OpenGL
 driver
 part is in the same address space and maybe internally locks
 occasionally?
 Or is this nonsense?

 Anyway, the osg part seems to be fairly straightforward and simple like
 this. 

[osg-users] Question about views, contexts and threading

2008-11-17 Thread Ferdi Smit
I'm looking to do the following in OSG, and I wonder if I'm on the right 
track (before wasting time needlessly): have two render processes run in 
parallel on two different GPUs, have one render a scene to texture and 
let this texture be read by the other process and mapped to an object in 
a different scene. Problem, the rendering of the first scene to texture 
is very slow and the rendering of the second scene is very fast.


I intend to solve it in the following way in pseudo-code:

- new CompositeViewer
- Add two Views
- Construct two contexts, one on localhost:0.0, one on localhost:0.1
- Attach contexts to cameras of corresponding Views
- Set composite viewer threading mode to thread-per-context

--- First process
- Set view camera mode to FBO and pre-render
- Add post-draw callback and render textures
- Download texture to host memory in post-draw callback
- (possibly add post-render camera to render textured screen quad as output)

--- Second process
- Add update-callback and regular texture
- Upload host memory to texture in update callback (if available, 
non-blocking)


The downloading and uploading of textures uses multiple slots and 
regular threaded locking, so to ensure we never read or write the same 
memory at the same time. The second process doesn't block if no new 
texture is available, it just continues using the old one then.


Some questions. Will the two processes now run at independent frame 
rates, or will the composite viewer synchronize them? I need them to run 
independently. I read OSG does not support multi-threaded updating of 
the scene graph. However, if I use two distinct scene graphs with two 
contexts, I can _pull_ updates in an update callback from another 
thread, right? What I can not do is push updates at arbitrary times; 
that would make sense. How do I make the TrackballManipulator work for 
only the first process? It seems that as soon as I set that camera to 
FBO it just doesn't respond to events (or maybe something else is 
wrong...  I added another orthogonal camera to the view1-getCamera() 
that renders the screenquad in post-render mode). Also, the second 
process camera is affected when I move the mouse in the first process 
window. Is it sufficient to call 
view2-getCamera()-setAllowEventFocus(false); to disable this behavior? 
Finally, can I do this the same way with a shared context on a single 
GPU (i.e. both on :0.0) sharing texture data directly on the GPU in 
different textures? Ignoring the slow context switching issues for the 
time being.


Am I one the right track here, or should this be done differently? I 
know all this is possible because I have the manual OpenGL code for it 
working, both using shared contexts and with up/downloading of texture data.


--
Regards,

Ferdi Smit
INS3 Visualization and 3D Interfaces
CWI Amsterdam, The Netherlands

___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] Question about views, contexts and threading

2008-11-17 Thread Robert Osfield
Hi Ferdi,

osgViewer::CompositeViewer runs all of the views synchronously - one
frame() call dispatches update, event, cull and draw traversals for
all the views.  So for you case where you want them to run async, this
isn't supported.  Supporting within CompositeViewer would really
complicate the API so it's not something I gone for.

What you will be able to do is use two separate Viewer's.  You are
likely to want to run two threads for each of the viewers frame loops
as well.  To get the render to image result to the second viewer all
you need to do is assign the same osg::Image to the first viewer's
Camera for it to copy to, and then attach the same osg::Image to a
texture in the scene of the second viewer.  The OSG should
automatically do the glReadPixels to the image data, dirty the Image,
and then automatically the texture will update in the second viewer.
You could potentially optimize things by using an PBO but the off the
shelf osg::PixelBufferObject isn't suitable for read in this way so
you'll need to roll you own support for this.

It's worth noting that I've never written a app like the above, so you
are rather working on the bleeding edge.  I think it should work, or
at least I can't spot any major problems that might appear.

Robert.

On Mon, Nov 17, 2008 at 9:37 AM, Ferdi Smit [EMAIL PROTECTED] wrote:
 I'm looking to do the following in OSG, and I wonder if I'm on the right
 track (before wasting time needlessly): have two render processes run in
 parallel on two different GPUs, have one render a scene to texture and let
 this texture be read by the other process and mapped to an object in a
 different scene. Problem, the rendering of the first scene to texture is
 very slow and the rendering of the second scene is very fast.

 I intend to solve it in the following way in pseudo-code:

 - new CompositeViewer
 - Add two Views
 - Construct two contexts, one on localhost:0.0, one on localhost:0.1
 - Attach contexts to cameras of corresponding Views
 - Set composite viewer threading mode to thread-per-context

 --- First process
 - Set view camera mode to FBO and pre-render
 - Add post-draw callback and render textures
 - Download texture to host memory in post-draw callback
 - (possibly add post-render camera to render textured screen quad as output)

 --- Second process
 - Add update-callback and regular texture
 - Upload host memory to texture in update callback (if available,
 non-blocking)

 The downloading and uploading of textures uses multiple slots and regular
 threaded locking, so to ensure we never read or write the same memory at the
 same time. The second process doesn't block if no new texture is available,
 it just continues using the old one then.

 Some questions. Will the two processes now run at independent frame rates,
 or will the composite viewer synchronize them? I need them to run
 independently. I read OSG does not support multi-threaded updating of the
 scene graph. However, if I use two distinct scene graphs with two contexts,
 I can _pull_ updates in an update callback from another thread, right? What
 I can not do is push updates at arbitrary times; that would make sense. How
 do I make the TrackballManipulator work for only the first process? It seems
 that as soon as I set that camera to FBO it just doesn't respond to events
 (or maybe something else is wrong...  I added another orthogonal camera to
 the view1-getCamera() that renders the screenquad in post-render mode).
 Also, the second process camera is affected when I move the mouse in the
 first process window. Is it sufficient to call
 view2-getCamera()-setAllowEventFocus(false); to disable this behavior?
 Finally, can I do this the same way with a shared context on a single GPU
 (i.e. both on :0.0) sharing texture data directly on the GPU in different
 textures? Ignoring the slow context switching issues for the time being.

 Am I one the right track here, or should this be done differently? I know
 all this is possible because I have the manual OpenGL code for it working,
 both using shared contexts and with up/downloading of texture data.

 --
 Regards,

 Ferdi Smit
 INS3 Visualization and 3D Interfaces
 CWI Amsterdam, The Netherlands

 ___
 osg-users mailing list
 osg-users@lists.openscenegraph.org
 http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] Question about views, contexts and threading

2008-11-17 Thread Ferdi Smit
Thanks Robert. I did a quick test with two viewers from two threads and 
it appears to be working. Btw, from my experience, PBO doesn't seem to 
be any faster (and on some hardware much slower) for downloading 
textures to host than glReadPixels, while for uploads it is almost 
consistently faster. Anyway, that should not be a problem, even to code 
it manually.


One question about the OpenGL driver, are you by any chance aware of any 
threading issues? Is it completely re-entrant from two different 
contexts and threads? With this two-thread setup, I see some occasional 
erratic fluctuation in drawing time in the osg performance hud for a 
completely still scene. The GPU performance is very stable, regardless 
of the load on the other card, but the drawing time (software) sometimes 
goes from something like 0.4 to 2.6 or 1.5 for a couple of frames. I do 
not notice this, or not as much, when using two separate processes 
instead of two threads. The only difference I can think of here is that 
the OpenGL driver part is in the same address space and maybe internally 
locks occasionally? Or is this nonsense?


Anyway, the osg part seems to be fairly straightforward and simple like 
this. Thanks.



Robert Osfield wrote:

Hi Ferdi,

osgViewer::CompositeViewer runs all of the views synchronously - one
frame() call dispatches update, event, cull and draw traversals for
all the views.  So for you case where you want them to run async, this
isn't supported.  Supporting within CompositeViewer would really
complicate the API so it's not something I gone for.

What you will be able to do is use two separate Viewer's.  You are
likely to want to run two threads for each of the viewers frame loops
as well.  To get the render to image result to the second viewer all
you need to do is assign the same osg::Image to the first viewer's
Camera for it to copy to, and then attach the same osg::Image to a
texture in the scene of the second viewer.  The OSG should
automatically do the glReadPixels to the image data, dirty the Image,
and then automatically the texture will update in the second viewer.
You could potentially optimize things by using an PBO but the off the
shelf osg::PixelBufferObject isn't suitable for read in this way so
you'll need to roll you own support for this.

It's worth noting that I've never written a app like the above, so you
are rather working on the bleeding edge.  I think it should work, or
at least I can't spot any major problems that might appear.

Robert.

On Mon, Nov 17, 2008 at 9:37 AM, Ferdi Smit [EMAIL PROTECTED] wrote:
  

I'm looking to do the following in OSG, and I wonder if I'm on the right
track (before wasting time needlessly): have two render processes run in
parallel on two different GPUs, have one render a scene to texture and let
this texture be read by the other process and mapped to an object in a
different scene. Problem, the rendering of the first scene to texture is
very slow and the rendering of the second scene is very fast.

I intend to solve it in the following way in pseudo-code:

- new CompositeViewer
- Add two Views
- Construct two contexts, one on localhost:0.0, one on localhost:0.1
- Attach contexts to cameras of corresponding Views
- Set composite viewer threading mode to thread-per-context

--- First process
- Set view camera mode to FBO and pre-render
- Add post-draw callback and render textures
- Download texture to host memory in post-draw callback
- (possibly add post-render camera to render textured screen quad as output)

--- Second process
- Add update-callback and regular texture
- Upload host memory to texture in update callback (if available,
non-blocking)

The downloading and uploading of textures uses multiple slots and regular
threaded locking, so to ensure we never read or write the same memory at the
same time. The second process doesn't block if no new texture is available,
it just continues using the old one then.

Some questions. Will the two processes now run at independent frame rates,
or will the composite viewer synchronize them? I need them to run
independently. I read OSG does not support multi-threaded updating of the
scene graph. However, if I use two distinct scene graphs with two contexts,
I can _pull_ updates in an update callback from another thread, right? What
I can not do is push updates at arbitrary times; that would make sense. How
do I make the TrackballManipulator work for only the first process? It seems
that as soon as I set that camera to FBO it just doesn't respond to events
(or maybe something else is wrong...  I added another orthogonal camera to
the view1-getCamera() that renders the screenquad in post-render mode).
Also, the second process camera is affected when I move the mouse in the
first process window. Is it sufficient to call
view2-getCamera()-setAllowEventFocus(false); to disable this behavior?
Finally, can I do this the same way with a shared context on a single GPU
(i.e. both on :0.0) sharing texture data 

Re: [osg-users] Question about views, contexts and threading

2008-11-17 Thread Robert Osfield
Hi Ferdi,

W.r.t performance and stability of multi-threading the graphics, as
long as you have two GPU's the most efficient way to drive them should
be multi-threaded - there is a caveat though, hardware and drivers
aren't always up to scratch, and even then they should be able to
manage the multi-threads and multi-gpus seemless they fail too.

I'm poised to build a new machine based on the new Intel iCore7 and
X58 motherboard, it'll be interesting to see how well is scales.

W.r.t PBO readback - it's very very sensitive to the pixel formats you
use.  See the osgscreencapture example.

Robert.

On Mon, Nov 17, 2008 at 5:31 PM, Ferdi Smit [EMAIL PROTECTED] wrote:
 Thanks Robert. I did a quick test with two viewers from two threads and it
 appears to be working. Btw, from my experience, PBO doesn't seem to be any
 faster (and on some hardware much slower) for downloading textures to host
 than glReadPixels, while for uploads it is almost consistently faster.
 Anyway, that should not be a problem, even to code it manually.

 One question about the OpenGL driver, are you by any chance aware of any
 threading issues? Is it completely re-entrant from two different contexts
 and threads? With this two-thread setup, I see some occasional erratic
 fluctuation in drawing time in the osg performance hud for a completely
 still scene. The GPU performance is very stable, regardless of the load on
 the other card, but the drawing time (software) sometimes goes from
 something like 0.4 to 2.6 or 1.5 for a couple of frames. I do not notice
 this, or not as much, when using two separate processes instead of two
 threads. The only difference I can think of here is that the OpenGL driver
 part is in the same address space and maybe internally locks occasionally?
 Or is this nonsense?

 Anyway, the osg part seems to be fairly straightforward and simple like
 this. Thanks.


 Robert Osfield wrote:

 Hi Ferdi,

 osgViewer::CompositeViewer runs all of the views synchronously - one
 frame() call dispatches update, event, cull and draw traversals for
 all the views.  So for you case where you want them to run async, this
 isn't supported.  Supporting within CompositeViewer would really
 complicate the API so it's not something I gone for.

 What you will be able to do is use two separate Viewer's.  You are
 likely to want to run two threads for each of the viewers frame loops
 as well.  To get the render to image result to the second viewer all
 you need to do is assign the same osg::Image to the first viewer's
 Camera for it to copy to, and then attach the same osg::Image to a
 texture in the scene of the second viewer.  The OSG should
 automatically do the glReadPixels to the image data, dirty the Image,
 and then automatically the texture will update in the second viewer.
 You could potentially optimize things by using an PBO but the off the
 shelf osg::PixelBufferObject isn't suitable for read in this way so
 you'll need to roll you own support for this.

 It's worth noting that I've never written a app like the above, so you
 are rather working on the bleeding edge.  I think it should work, or
 at least I can't spot any major problems that might appear.

 Robert.

 On Mon, Nov 17, 2008 at 9:37 AM, Ferdi Smit [EMAIL PROTECTED] wrote:


 I'm looking to do the following in OSG, and I wonder if I'm on the right
 track (before wasting time needlessly): have two render processes run in
 parallel on two different GPUs, have one render a scene to texture and
 let
 this texture be read by the other process and mapped to an object in a
 different scene. Problem, the rendering of the first scene to texture is
 very slow and the rendering of the second scene is very fast.

 I intend to solve it in the following way in pseudo-code:

 - new CompositeViewer
 - Add two Views
 - Construct two contexts, one on localhost:0.0, one on localhost:0.1
 - Attach contexts to cameras of corresponding Views
 - Set composite viewer threading mode to thread-per-context

 --- First process
 - Set view camera mode to FBO and pre-render
 - Add post-draw callback and render textures
 - Download texture to host memory in post-draw callback
 - (possibly add post-render camera to render textured screen quad as
 output)

 --- Second process
 - Add update-callback and regular texture
 - Upload host memory to texture in update callback (if available,
 non-blocking)

 The downloading and uploading of textures uses multiple slots and regular
 threaded locking, so to ensure we never read or write the same memory at
 the
 same time. The second process doesn't block if no new texture is
 available,
 it just continues using the old one then.

 Some questions. Will the two processes now run at independent frame
 rates,
 or will the composite viewer synchronize them? I need them to run
 independently. I read OSG does not support multi-threaded updating of the
 scene graph. However, if I use two distinct scene graphs with two
 contexts,
 I can _pull_ updates in an update