Re: [Paraview] Parallel Streamtracer

2012-06-08 Thread Stephan Rogge
Hello Burlen,

thank you very much for your post. I really would like to test your plugin
and so I've start to build it. Unfortunately I've got a lot of compiler
errors (e.g. vtkstd isn't used in PV master anymore). Which PV version is
the base for your plugin?

Regards,
Stephan

-Ursprüngliche Nachricht-
Von: Burlen Loring [mailto:blor...@lbl.gov] 
Gesendet: Donnerstag, 7. Juni 2012 17:54
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

I've experienced the scaling behavior that you report when I was working on
a project that required generating millions of streamlines for a topological
mapping algorithm interactively in ParaView. To get the required scaling I
wrote a stream tracer that uses a load on demand approach with tunable block
cache so that all ranks could integrate any streamline and stay busy
throughout the entire computation. It was very effective on our data and
I've used it to integrate 30 Million streamlines in about 10min on 256
cores. If you really need better scalability than the distributed data
tracing approach implemented in PV, you might take a look at our work. The
down side of our approach is that in order to provide the demand loading the
reader has to implement a vtk object that provides an api giving the
integrator direct access to I/O functionality. In case you're interested the
stream tracer is class is vtkSQFieldTracer and our reader is vtkSQBOVReader.
The latest release could be found here
https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531

Burlen

On 06/04/2012 02:21 AM, Stephan Rogge wrote:
 Hello Leo,

 ok, I took the disk_out_ref.ex2 example data set and did some time 
 measurements. Remember, my machine has 4 Cores + HyperThreading.

 My first observation is that PV seems to have a problem with 
 distributing the data when the Multi-Core option (GUI) is enabled. 
 When PV is started with builtin Multi-Core I was not able to apply a 
 stream tracer with more than 1000 seed points (PV is freezing and 
 never comes back). Otherwise, when pvserver processes has been started 
 manually I was able to set up to 100.000 seed points. Is it a bug?

 Now let's have a look on the scaling performance. As you suggested, 
 I've used the D3 filter for distributing the data along the processes. 
 The stream tracer execution time for 10.000 seed points:

 ##   Bulitin: 10.063 seconds
 ##   1 MPI-Process (no D3): 10.162 seconds
 ##   4 MPI-Processes: 15.615 seconds
 ##   8 MPI-Processes: 14.103 seconds

 and 100.000 seed points:

 ##   Bulitin: 100.603 seconds
 ##   1 MPI-Process (no D3): 100.967 seconds
 ##   4 MPI-Processes: 168.1 seconds
 ##   8 MPI-Processes: 171.325 seconds

 I cannot see any positive scaling behavior here. Maybe is this example 
 not appropriate for scaling measurements?

 One more thing: I've visualized the vtkProcessId and saw that the 
 whole vector field is partitioned. I thought, that each streamline is 
 integrated in its own process. But it seems that this is not the case. 
 This could explain my scaling issues: In cases of small vector fields 
 the overhead of synchronization becomes too large and decreases the
overall performance.

 My suggestion is to have a parallel StreamTracer which is built for a 
 single machine with several threads. Could be worth to randomly 
 distribute the seeds over all available (local) processes? Of course, 
 each process have access on the whole vector field.

 Cheers,
 Stephan



 Von: Yuanxin Liu [mailto:leo@kitware.com]
 Gesendet: Freitag, 1. Juni 2012 16:13
 An: Stephan Rogge
 Cc: Andy Bauer; paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 Hi, Stephan,
I did measure the performance at some point and was able to get 
 fairly decent speed up with more processors. So I am surprised you are 
 seeing huge latency.

Of course, the performance is sensitive to the input.  It is also 
 sensitive to how readers distribute data. So, one thing you might want 
 to try is to attach the D3 filter to the reader.

If that doesn't help,  I will be happy to get your data and take a
look.

 Leo

 On Fri, Jun 1, 2012 at 1:54 AM, Stephan 
 Roggestephan.ro...@tu-cottbus.de
 wrote:
 Leo,

 As I mentioned in my initial post of this thread: I used the 
 up-to-date master branch of ParaView. Which means I have already used 
 your implementation.

 I can imagine, to parallelize this algorithm can be very tough. And I 
 can see that distribute the calculation over 8 processes does not lead 
 to a nice scaling.

 But I don't understand this huge amount of latency when using the 
 StreamTracer in a Cave-Mode with two view ports and two pvserver 
 processes on the same machine (extra machine for the client). I guess 
 the tracer filter is applied for each viewport separately? This would 
 be ok as long as both filter executions run parallel. And I doubt that
this is the case.

 Can you help to clarify my problem?

 Regards

Re: [Paraview] Parallel Streamtracer

2012-06-08 Thread Stephan Rogge
Someone told me that you have to clear your build directory completely and
start a fresh PV build. 

Stephan

-Ursprüngliche Nachricht-
Von: burlen [mailto:burlen.lor...@gmail.com] 
Gesendet: Freitag, 8. Juni 2012 16:21
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

Oh, thanks for the update, I wasn't aware of these changes. I have been
working with 3.14.1.

Burlen

On 06/08/2012 01:47 AM, Stephan Rogge wrote:
 Hello Burlen,

 thank you very much for your post. I really would like to test your 
 plugin and so I've start to build it. Unfortunately I've got a lot of 
 compiler errors (e.g. vtkstd isn't used in PV master anymore). Which 
 PV version is the base for your plugin?

 Regards,
 Stephan

 -Ursprüngliche Nachricht-
 Von: Burlen Loring [mailto:blor...@lbl.gov]
 Gesendet: Donnerstag, 7. Juni 2012 17:54
 An: Stephan Rogge
 Cc: 'Yuanxin Liu'; paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 Hi Stephan,

 I've experienced the scaling behavior that you report when I was 
 working on a project that required generating millions of streamlines 
 for a topological mapping algorithm interactively in ParaView. To get 
 the required scaling I wrote a stream tracer that uses a load on 
 demand approach with tunable block cache so that all ranks could 
 integrate any streamline and stay busy throughout the entire 
 computation. It was very effective on our data and I've used it to 
 integrate 30 Million streamlines in about 10min on 256 cores. If you 
 really need better scalability than the distributed data tracing 
 approach implemented in PV, you might take a look at our work. The 
 down side of our approach is that in order to provide the demand 
 loading the reader has to implement a vtk object that provides an api 
 giving the integrator direct access to I/O functionality. In case you're
interested the stream tracer is class is vtkSQFieldTracer and our reader is
vtkSQBOVReader.
 The latest release could be found here
 https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531

 Burlen

 On 06/04/2012 02:21 AM, Stephan Rogge wrote:
 Hello Leo,

 ok, I took the disk_out_ref.ex2 example data set and did some time 
 measurements. Remember, my machine has 4 Cores + HyperThreading.

 My first observation is that PV seems to have a problem with 
 distributing the data when the Multi-Core option (GUI) is enabled.
 When PV is started with builtin Multi-Core I was not able to apply a 
 stream tracer with more than 1000 seed points (PV is freezing and 
 never comes back). Otherwise, when pvserver processes has been 
 started manually I was able to set up to 100.000 seed points. Is it a
bug?

 Now let's have a look on the scaling performance. As you suggested, 
 I've used the D3 filter for distributing the data along the processes.
 The stream tracer execution time for 10.000 seed points:

 ##   Bulitin: 10.063 seconds
 ##   1 MPI-Process (no D3): 10.162 seconds
 ##   4 MPI-Processes: 15.615 seconds
 ##   8 MPI-Processes: 14.103 seconds

 and 100.000 seed points:

 ##   Bulitin: 100.603 seconds
 ##   1 MPI-Process (no D3): 100.967 seconds
 ##   4 MPI-Processes: 168.1 seconds
 ##   8 MPI-Processes: 171.325 seconds

 I cannot see any positive scaling behavior here. Maybe is this 
 example not appropriate for scaling measurements?

 One more thing: I've visualized the vtkProcessId and saw that the 
 whole vector field is partitioned. I thought, that each streamline is 
 integrated in its own process. But it seems that this is not the case.
 This could explain my scaling issues: In cases of small vector fields 
 the overhead of synchronization becomes too large and decreases the
 overall performance.
 My suggestion is to have a parallel StreamTracer which is built for a 
 single machine with several threads. Could be worth to randomly 
 distribute the seeds over all available (local) processes? Of course, 
 each process have access on the whole vector field.

 Cheers,
 Stephan



 Von: Yuanxin Liu [mailto:leo@kitware.com]
 Gesendet: Freitag, 1. Juni 2012 16:13
 An: Stephan Rogge
 Cc: Andy Bauer; paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 Hi, Stephan,
 I did measure the performance at some point and was able to get 
 fairly decent speed up with more processors. So I am surprised you 
 are seeing huge latency.

 Of course, the performance is sensitive to the input.  It is also 
 sensitive to how readers distribute data. So, one thing you might 
 want to try is to attach the D3 filter to the reader.

 If that doesn't help,  I will be happy to get your data and take 
 a
 look.
 Leo

 On Fri, Jun 1, 2012 at 1:54 AM, Stephan 
 Roggestephan.ro...@tu-cottbus.de
 wrote:
 Leo,

 As I mentioned in my initial post of this thread: I used the 
 up-to-date master branch of ParaView. Which means I have already used 
 your implementation.

 I can imagine, to parallelize

Re: [Paraview] Parallel Streamtracer

2012-06-08 Thread burlen

OK, you had me a little worried there, ;)

I will send you some instructions and example data to test with, our 
network is down due to an unexpected power outage so it won't be today.


Burlen

On 06/08/2012 07:25 AM, Stephan Rogge wrote:

Someone told me that you have to clear your build directory completely and
start a fresh PV build.

Stephan

-Ursprüngliche Nachricht-
Von: burlen [mailto:burlen.lor...@gmail.com]
Gesendet: Freitag, 8. Juni 2012 16:21
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

Oh, thanks for the update, I wasn't aware of these changes. I have been
working with 3.14.1.

Burlen

On 06/08/2012 01:47 AM, Stephan Rogge wrote:

Hello Burlen,

thank you very much for your post. I really would like to test your
plugin and so I've start to build it. Unfortunately I've got a lot of
compiler errors (e.g. vtkstd isn't used in PV master anymore). Which
PV version is the base for your plugin?

Regards,
Stephan

-Ursprüngliche Nachricht-
Von: Burlen Loring [mailto:blor...@lbl.gov]
Gesendet: Donnerstag, 7. Juni 2012 17:54
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

I've experienced the scaling behavior that you report when I was
working on a project that required generating millions of streamlines
for a topological mapping algorithm interactively in ParaView. To get
the required scaling I wrote a stream tracer that uses a load on
demand approach with tunable block cache so that all ranks could
integrate any streamline and stay busy throughout the entire
computation. It was very effective on our data and I've used it to
integrate 30 Million streamlines in about 10min on 256 cores. If you
really need better scalability than the distributed data tracing
approach implemented in PV, you might take a look at our work. The
down side of our approach is that in order to provide the demand
loading the reader has to implement a vtk object that provides an api
giving the integrator direct access to I/O functionality. In case you're

interested the stream tracer is class is vtkSQFieldTracer and our reader is
vtkSQBOVReader.

The latest release could be found here
https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531

Burlen

On 06/04/2012 02:21 AM, Stephan Rogge wrote:

Hello Leo,

ok, I took the disk_out_ref.ex2 example data set and did some time
measurements. Remember, my machine has 4 Cores + HyperThreading.

My first observation is that PV seems to have a problem with
distributing the data when the Multi-Core option (GUI) is enabled.
When PV is started with builtin Multi-Core I was not able to apply a
stream tracer with more than 1000 seed points (PV is freezing and
never comes back). Otherwise, when pvserver processes has been
started manually I was able to set up to 100.000 seed points. Is it a

bug?

Now let's have a look on the scaling performance. As you suggested,
I've used the D3 filter for distributing the data along the processes.
The stream tracer execution time for 10.000 seed points:

##   Bulitin: 10.063 seconds
##   1 MPI-Process (no D3): 10.162 seconds
##   4 MPI-Processes: 15.615 seconds
##   8 MPI-Processes: 14.103 seconds

and 100.000 seed points:

##   Bulitin: 100.603 seconds
##   1 MPI-Process (no D3): 100.967 seconds
##   4 MPI-Processes: 168.1 seconds
##   8 MPI-Processes: 171.325 seconds

I cannot see any positive scaling behavior here. Maybe is this
example not appropriate for scaling measurements?

One more thing: I've visualized the vtkProcessId and saw that the
whole vector field is partitioned. I thought, that each streamline is
integrated in its own process. But it seems that this is not the case.
This could explain my scaling issues: In cases of small vector fields
the overhead of synchronization becomes too large and decreases the

overall performance.

My suggestion is to have a parallel StreamTracer which is built for a
single machine with several threads. Could be worth to randomly
distribute the seeds over all available (local) processes? Of course,
each process have access on the whole vector field.

Cheers,
Stephan



Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Freitag, 1. Juni 2012 16:13
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi, Stephan,
 I did measure the performance at some point and was able to get
fairly decent speed up with more processors. So I am surprised you
are seeing huge latency.

 Of course, the performance is sensitive to the input.  It is also
sensitive to how readers distribute data. So, one thing you might
want to try is to attach the D3 filter to the reader.

 If that doesn't help,  I will be happy to get your data and take
a

look.

Leo

On Fri, Jun 1, 2012 at 1:54 AM, Stephan
Roggestephan.ro...@tu-cottbus.de
wrote:
Leo,

As I mentioned in my initial post of this thread: I used the
up

Re: [Paraview] Parallel Streamtracer

2012-06-08 Thread burlen

Hi Leo,

Thanks, yes please send your fixes, or you could also push them to 
github. Which ever you prefer.


Burlen

On 06/08/2012 09:10 AM, Yuanxin Liu wrote:

Hi,
  I have recently gotten Burlen's code and updated it to work with the 
latest ParaView.  Aside from vtkstd, there are also a few backward 
incompatible VTK changes ( see the VTK6.0 section on the VTK wiki).   
But it is not too much work. I will be happy send either of you my 
code changes if you need a reference.


Leo


On Fri, Jun 8, 2012 at 10:25 AM, Stephan Rogge 
stephan.ro...@tu-cottbus.de mailto:stephan.ro...@tu-cottbus.de wrote:


Someone told me that you have to clear your build directory
completely and
start a fresh PV build.

Stephan

-Ursprüngliche Nachricht-
Von: burlen [mailto:burlen.lor...@gmail.com
mailto:burlen.lor...@gmail.com]
Gesendet: Freitag, 8. Juni 2012 16:21
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview@paraview.org
mailto:paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

Oh, thanks for the update, I wasn't aware of these changes. I have
been
working with 3.14.1.

Burlen

On 06/08/2012 01:47 AM, Stephan Rogge wrote:
 Hello Burlen,

 thank you very much for your post. I really would like to test your
 plugin and so I've start to build it. Unfortunately I've got a
lot of
 compiler errors (e.g. vtkstd isn't used in PV master anymore). Which
 PV version is the base for your plugin?

 Regards,
 Stephan

 -Ursprüngliche Nachricht-
 Von: Burlen Loring [mailto:blor...@lbl.gov mailto:blor...@lbl.gov]
 Gesendet: Donnerstag, 7. Juni 2012 17:54
 An: Stephan Rogge
 Cc: 'Yuanxin Liu'; paraview@paraview.org
mailto:paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 Hi Stephan,

 I've experienced the scaling behavior that you report when I was
 working on a project that required generating millions of
streamlines
 for a topological mapping algorithm interactively in ParaView.
To get
 the required scaling I wrote a stream tracer that uses a load on
 demand approach with tunable block cache so that all ranks could
 integrate any streamline and stay busy throughout the entire
 computation. It was very effective on our data and I've used it to
 integrate 30 Million streamlines in about 10min on 256 cores. If you
 really need better scalability than the distributed data tracing
 approach implemented in PV, you might take a look at our work. The
 down side of our approach is that in order to provide the demand
 loading the reader has to implement a vtk object that provides
an api
 giving the integrator direct access to I/O functionality. In
case you're
interested the stream tracer is class is vtkSQFieldTracer and our
reader is
vtkSQBOVReader.
 The latest release could be found here
 https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531

 Burlen

 On 06/04/2012 02:21 AM, Stephan Rogge wrote:
 Hello Leo,

 ok, I took the disk_out_ref.ex2 example data set and did some
time
 measurements. Remember, my machine has 4 Cores + HyperThreading.

 My first observation is that PV seems to have a problem with
 distributing the data when the Multi-Core option (GUI) is enabled.
 When PV is started with builtin Multi-Core I was not able to
apply a
 stream tracer with more than 1000 seed points (PV is freezing and
 never comes back). Otherwise, when pvserver processes has been
 started manually I was able to set up to 100.000 seed points.
Is it a
bug?

 Now let's have a look on the scaling performance. As you suggested,
 I've used the D3 filter for distributing the data along the
processes.
 The stream tracer execution time for 10.000 seed points:

 ##   Bulitin: 10.063 seconds
 ##   1 MPI-Process (no D3): 10.162 seconds
 ##   4 MPI-Processes: 15.615 seconds
 ##   8 MPI-Processes: 14.103 seconds

 and 100.000 seed points:

 ##   Bulitin: 100.603 seconds
 ##   1 MPI-Process (no D3): 100.967 seconds
 ##   4 MPI-Processes: 168.1 seconds
 ##   8 MPI-Processes: 171.325 seconds

 I cannot see any positive scaling behavior here. Maybe is this
 example not appropriate for scaling measurements?

 One more thing: I've visualized the vtkProcessId and saw that the
 whole vector field is partitioned. I thought, that each
streamline is
 integrated in its own process. But it seems that this is not
the case.
 This could explain my scaling issues: In cases of small vector
fields
 the overhead of synchronization becomes too large and decreases the
 overall performance.
 My suggestion is to have a parallel StreamTracer which is built
for a
 single

Re: [Paraview] Parallel Streamtracer

2012-06-08 Thread burlen

Hi Stephan,

As promised here are instructions and a small test dataset. 
http://www.hpcvis.com/vis/sq-field-tracer.html


Burlen

On 06/08/2012 11:14 AM, burlen wrote:

OK, you had me a little worried there, ;)

I will send you some instructions and example data to test with, our 
network is down due to an unexpected power outage so it won't be today.


Burlen

On 06/08/2012 07:25 AM, Stephan Rogge wrote:
Someone told me that you have to clear your build directory 
completely and

start a fresh PV build.

Stephan

-Ursprüngliche Nachricht-
Von: burlen [mailto:burlen.lor...@gmail.com]
Gesendet: Freitag, 8. Juni 2012 16:21
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

Oh, thanks for the update, I wasn't aware of these changes. I have been
working with 3.14.1.

Burlen

On 06/08/2012 01:47 AM, Stephan Rogge wrote:

Hello Burlen,

thank you very much for your post. I really would like to test your
plugin and so I've start to build it. Unfortunately I've got a lot of
compiler errors (e.g. vtkstd isn't used in PV master anymore). Which
PV version is the base for your plugin?

Regards,
Stephan

-Ursprüngliche Nachricht-
Von: Burlen Loring [mailto:blor...@lbl.gov]
Gesendet: Donnerstag, 7. Juni 2012 17:54
An: Stephan Rogge
Cc: 'Yuanxin Liu'; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi Stephan,

I've experienced the scaling behavior that you report when I was
working on a project that required generating millions of streamlines
for a topological mapping algorithm interactively in ParaView. To get
the required scaling I wrote a stream tracer that uses a load on
demand approach with tunable block cache so that all ranks could
integrate any streamline and stay busy throughout the entire
computation. It was very effective on our data and I've used it to
integrate 30 Million streamlines in about 10min on 256 cores. If you
really need better scalability than the distributed data tracing
approach implemented in PV, you might take a look at our work. The
down side of our approach is that in order to provide the demand
loading the reader has to implement a vtk object that provides an api
giving the integrator direct access to I/O functionality. In case 
you're
interested the stream tracer is class is vtkSQFieldTracer and our 
reader is

vtkSQBOVReader.

The latest release could be found here
https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531

Burlen

On 06/04/2012 02:21 AM, Stephan Rogge wrote:

Hello Leo,

ok, I took the disk_out_ref.ex2 example data set and did some time
measurements. Remember, my machine has 4 Cores + HyperThreading.

My first observation is that PV seems to have a problem with
distributing the data when the Multi-Core option (GUI) is enabled.
When PV is started with builtin Multi-Core I was not able to apply a
stream tracer with more than 1000 seed points (PV is freezing and
never comes back). Otherwise, when pvserver processes has been
started manually I was able to set up to 100.000 seed points. Is it a

bug?

Now let's have a look on the scaling performance. As you suggested,
I've used the D3 filter for distributing the data along the processes.
The stream tracer execution time for 10.000 seed points:

##   Bulitin: 10.063 seconds
##   1 MPI-Process (no D3): 10.162 seconds
##   4 MPI-Processes: 15.615 seconds
##   8 MPI-Processes: 14.103 seconds

and 100.000 seed points:

##   Bulitin: 100.603 seconds
##   1 MPI-Process (no D3): 100.967 seconds
##   4 MPI-Processes: 168.1 seconds
##   8 MPI-Processes: 171.325 seconds

I cannot see any positive scaling behavior here. Maybe is this
example not appropriate for scaling measurements?

One more thing: I've visualized the vtkProcessId and saw that the
whole vector field is partitioned. I thought, that each streamline is
integrated in its own process. But it seems that this is not the case.
This could explain my scaling issues: In cases of small vector fields
the overhead of synchronization becomes too large and decreases the

overall performance.

My suggestion is to have a parallel StreamTracer which is built for a
single machine with several threads. Could be worth to randomly
distribute the seeds over all available (local) processes? Of course,
each process have access on the whole vector field.

Cheers,
Stephan



Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Freitag, 1. Juni 2012 16:13
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi, Stephan,
 I did measure the performance at some point and was able to get
fairly decent speed up with more processors. So I am surprised you
are seeing huge latency.

 Of course, the performance is sensitive to the input.  It is also
sensitive to how readers distribute data. So, one thing you might
want to try is to attach the D3 filter to the reader.

 If that doesn't help,  I will be happy to get your data

Re: [Paraview] Parallel Streamtracer

2012-06-07 Thread Burlen Loring

Hi Stephan,

I've experienced the scaling behavior that you report when I was working 
on a project that required generating millions of streamlines for a 
topological mapping algorithm interactively in ParaView. To get the 
required scaling I wrote a stream tracer that uses a load on demand 
approach with tunable block cache so that all ranks could integrate any 
streamline and stay busy throughout the entire computation. It was very 
effective on our data and I've used it to integrate 30 Million 
streamlines in about 10min on 256 cores. If you really need better 
scalability than the distributed data tracing approach implemented in 
PV, you might take a look at our work. The down side of our approach is 
that in order to provide the demand loading the reader has to implement 
a vtk object that provides an api giving the integrator direct access to 
I/O functionality. In case you're interested the stream tracer is class 
is vtkSQFieldTracer and our reader is vtkSQBOVReader. The latest release 
could be found here 
https://github.com/burlen/SciberQuestToolKit/tarball/SQTK-20120531


Burlen

On 06/04/2012 02:21 AM, Stephan Rogge wrote:

Hello Leo,

ok, I took the disk_out_ref.ex2 example data set and did some time
measurements. Remember, my machine has 4 Cores + HyperThreading.

My first observation is that PV seems to have a problem with distributing
the data when the Multi-Core option (GUI) is enabled. When PV is started
with builtin Multi-Core I was not able to apply a stream tracer with more
than 1000 seed points (PV is freezing and never comes back). Otherwise, when
pvserver processes has been started manually I was able to set up to 100.000
seed points. Is it a bug?

Now let's have a look on the scaling performance. As you suggested, I've
used the D3 filter for distributing the data along the processes. The stream
tracer execution time for 10.000 seed points:

##   Bulitin: 10.063 seconds
##   1 MPI-Process (no D3): 10.162 seconds
##   4 MPI-Processes: 15.615 seconds
##   8 MPI-Processes: 14.103 seconds

and 100.000 seed points:

##   Bulitin: 100.603 seconds
##   1 MPI-Process (no D3): 100.967 seconds
##   4 MPI-Processes: 168.1 seconds
##   8 MPI-Processes: 171.325 seconds

I cannot see any positive scaling behavior here. Maybe is this example not
appropriate for scaling measurements?

One more thing: I've visualized the vtkProcessId and saw that the whole
vector field is partitioned. I thought, that each streamline is integrated
in its own process. But it seems that this is not the case. This could
explain my scaling issues: In cases of small vector fields the overhead of
synchronization becomes too large and decreases the overall performance.

My suggestion is to have a parallel StreamTracer which is built for a single
machine with several threads. Could be worth to randomly distribute the
seeds over all available (local) processes? Of course, each process have
access on the whole vector field.

Cheers,
Stephan



Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Freitag, 1. Juni 2012 16:13
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi, Stephan,
   I did measure the performance at some point and was able to get fairly
decent speed up with more processors. So I am surprised you are seeing huge
latency.

   Of course, the performance is sensitive to the input.  It is also
sensitive to how readers distribute data. So, one thing you might want to
try is to attach the D3 filter to the reader.

   If that doesn't help,  I will be happy to get your data and take a look.

Leo

On Fri, Jun 1, 2012 at 1:54 AM, Stephan Roggestephan.ro...@tu-cottbus.de
wrote:
Leo,

As I mentioned in my initial post of this thread: I used the up-to-date
master branch of ParaView. Which means I have already used your
implementation.

I can imagine, to parallelize this algorithm can be very tough. And I can
see that distribute the calculation over 8 processes does not lead to a nice
scaling.

But I don't understand this huge amount of latency when using the
StreamTracer in a Cave-Mode with two view ports and two pvserver processes
on the same machine (extra machine for the client). I guess the tracer
filter is applied for each viewport separately? This would be ok as long as
both filter executions run parallel. And I doubt that this is the case.

Can you help to clarify my problem?

Regards,
Stephan


Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Donnerstag, 31. Mai 2012 21:33
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

It is in the current VTK and ParaView master.  The class is
vtkPStreamTracer. 


Leo
On Thu, May 31, 2012 at 3:31 PM, Stephan Roggestephan.ro...@tu-cottbus.de
wrote:
Hi, Andy and Leo,

thanks for your replies.

Is it possible to get this new implementation? I would to give it a try.

Regards,
Stephan

Am 31.05.2012 um 17:48 schrieb Yuanxin Liuleo@kitware.com:
Hi, Stephan

Re: [Paraview] Parallel Streamtracer

2012-06-06 Thread Berk Geveci
By the way, did you make sure to apply D3? disk_out_ref.ex2 is not
partitioned so by default it would be loaded entirely onto MPI rank 0.

On Mon, Jun 4, 2012 at 5:21 AM, Stephan Rogge
stephan.ro...@tu-cottbus.dewrote:

 Hello Leo,

 ok, I took the disk_out_ref.ex2 example data set and did some time
 measurements. Remember, my machine has 4 Cores + HyperThreading.

 My first observation is that PV seems to have a problem with distributing
 the data when the Multi-Core option (GUI) is enabled. When PV is started
 with builtin Multi-Core I was not able to apply a stream tracer with more
 than 1000 seed points (PV is freezing and never comes back). Otherwise,
 when
 pvserver processes has been started manually I was able to set up to
 100.000
 seed points. Is it a bug?

 Now let's have a look on the scaling performance. As you suggested, I've
 used the D3 filter for distributing the data along the processes. The
 stream
 tracer execution time for 10.000 seed points:

 ##   Bulitin: 10.063 seconds
 ##   1 MPI-Process (no D3): 10.162 seconds
 ##   4 MPI-Processes: 15.615 seconds
 ##   8 MPI-Processes: 14.103 seconds

 and 100.000 seed points:

 ##   Bulitin: 100.603 seconds
 ##   1 MPI-Process (no D3): 100.967 seconds
 ##   4 MPI-Processes: 168.1 seconds
 ##   8 MPI-Processes: 171.325 seconds

 I cannot see any positive scaling behavior here. Maybe is this example not
 appropriate for scaling measurements?

 One more thing: I've visualized the vtkProcessId and saw that the whole
 vector field is partitioned. I thought, that each streamline is integrated
 in its own process. But it seems that this is not the case. This could
 explain my scaling issues: In cases of small vector fields the overhead of
 synchronization becomes too large and decreases the overall performance.

 My suggestion is to have a parallel StreamTracer which is built for a
 single
 machine with several threads. Could be worth to randomly distribute the
 seeds over all available (local) processes? Of course, each process have
 access on the whole vector field.

 Cheers,
 Stephan



 Von: Yuanxin Liu [mailto:leo@kitware.com]
 Gesendet: Freitag, 1. Juni 2012 16:13
 An: Stephan Rogge
 Cc: Andy Bauer; paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 Hi, Stephan,
   I did measure the performance at some point and was able to get fairly
 decent speed up with more processors. So I am surprised you are seeing huge
 latency.

   Of course, the performance is sensitive to the input.  It is also
 sensitive to how readers distribute data. So, one thing you might want to
 try is to attach the D3 filter to the reader.

   If that doesn't help,  I will be happy to get your data and take a look.

 Leo

 On Fri, Jun 1, 2012 at 1:54 AM, Stephan Rogge stephan.ro...@tu-cottbus.de
 
 wrote:
 Leo,

 As I mentioned in my initial post of this thread: I used the up-to-date
 master branch of ParaView. Which means I have already used your
 implementation.

 I can imagine, to parallelize this algorithm can be very tough. And I can
 see that distribute the calculation over 8 processes does not lead to a
 nice
 scaling.

 But I don't understand this huge amount of latency when using the
 StreamTracer in a Cave-Mode with two view ports and two pvserver processes
 on the same machine (extra machine for the client). I guess the tracer
 filter is applied for each viewport separately? This would be ok as long as
 both filter executions run parallel. And I doubt that this is the case.

 Can you help to clarify my problem?

 Regards,
 Stephan


 Von: Yuanxin Liu [mailto:leo@kitware.com]
 Gesendet: Donnerstag, 31. Mai 2012 21:33
 An: Stephan Rogge
 Cc: Andy Bauer; paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 It is in the current VTK and ParaView master.  The class is
 vtkPStreamTracer.

 Leo
 On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge 
 stephan.ro...@tu-cottbus.de
 wrote:
 Hi, Andy and Leo,

 thanks for your replies.

 Is it possible to get this new implementation? I would to give it a try.

 Regards,
 Stephan

 Am 31.05.2012 um 17:48 schrieb Yuanxin Liu leo@kitware.com:
 Hi, Stephan,
The previous implementation only has serial performance:  It traces the
 streamlines one at a time and never starts a new streamline until the
 previous one finishes.  With communication overhead, it is not surprising
 it
 got slower.

   My new implementation is able to let the processes working on different
 streamlines simultaneously and should scale much better.

 Leo

 On Thu, May 31, 2012 at 11:27 AM, Andy Bauer andy.ba...@kitware.com
 wrote:
 Hi Stephan,

 The parallel stream tracer uses the partitioning of the grid to determine
 which process does the integration. When the streamline exits the subdomain
 of a process there is a search to see if it enters a subdomain assigned to
 any other processes before figuring it whether it has left the entire
 domain.

 Leo, copied here, has been improving the streamline implementation

Re: [Paraview] Parallel Streamtracer

2012-06-06 Thread Stephan Rogge
Hello Berk,

absolutely. After applying both filter, D3 and StreamTracer, I've visualized
the partitions with vtkProcessId to check whether D3 was applied or not and
was able to see that the stream lines had different (homogenous) colors
depending on their region. The D3 filter is only applied by more than one
MPI process.

To make things clearer:

##   Bulitin (no D3): 10.063 seconds
##   1 MPI-Process (no D3): 10.162 seconds
##   4 MPI-Processes (D3): 15.615 seconds
##   8 MPI-Processes(D3):  14.103 seconds

and 100.000 seed points:

##   Bulitin (no D3): 100.603 seconds
##   1 MPI-Process (no D3): 100.967 seconds
##   4 MPI-Processes(D3):  168.1 seconds
##   8 MPI-Processes(D3):  171.325 seconds

Sorry, for the confusion.

Regrads,
Stephan

Von: Berk Geveci [mailto:berk.gev...@kitware.com] 
Gesendet: Donnerstag, 7. Juni 2012 02:53
An: Stephan Rogge
Cc: Yuanxin Liu; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

By the way, did you make sure to apply D3? disk_out_ref.ex2 is not
partitioned so by default it would be loaded entirely onto MPI rank 0.
On Mon, Jun 4, 2012 at 5:21 AM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Hello Leo,

ok, I took the disk_out_ref.ex2 example data set and did some time
measurements. Remember, my machine has 4 Cores + HyperThreading.

My first observation is that PV seems to have a problem with distributing
the data when the Multi-Core option (GUI) is enabled. When PV is started
with builtin Multi-Core I was not able to apply a stream tracer with more
than 1000 seed points (PV is freezing and never comes back). Otherwise, when
pvserver processes has been started manually I was able to set up to 100.000
seed points. Is it a bug?

Now let's have a look on the scaling performance. As you suggested, I've
used the D3 filter for distributing the data along the processes. The stream
tracer execution time for 10.000 seed points:

##   Bulitin: 10.063 seconds
##   1 MPI-Process (no D3): 10.162 seconds
##   4 MPI-Processes: 15.615 seconds
##   8 MPI-Processes: 14.103 seconds

and 100.000 seed points:

##   Bulitin: 100.603 seconds
##   1 MPI-Process (no D3): 100.967 seconds
##   4 MPI-Processes: 168.1 seconds
##   8 MPI-Processes: 171.325 seconds

I cannot see any positive scaling behavior here. Maybe is this example not
appropriate for scaling measurements?

One more thing: I've visualized the vtkProcessId and saw that the whole
vector field is partitioned. I thought, that each streamline is integrated
in its own process. But it seems that this is not the case. This could
explain my scaling issues: In cases of small vector fields the overhead of
synchronization becomes too large and decreases the overall performance.

My suggestion is to have a parallel StreamTracer which is built for a single
machine with several threads. Could be worth to randomly distribute the
seeds over all available (local) processes? Of course, each process have
access on the whole vector field.

Cheers,
Stephan



Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Freitag, 1. Juni 2012 16:13
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi, Stephan,
  I did measure the performance at some point and was able to get fairly
decent speed up with more processors. So I am surprised you are seeing huge
latency.

  Of course, the performance is sensitive to the input.  It is also
sensitive to how readers distribute data. So, one thing you might want to
try is to attach the D3 filter to the reader.

  If that doesn't help,  I will be happy to get your data and take a look.

Leo

On Fri, Jun 1, 2012 at 1:54 AM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Leo,

As I mentioned in my initial post of this thread: I used the up-to-date
master branch of ParaView. Which means I have already used your
implementation.

I can imagine, to parallelize this algorithm can be very tough. And I can
see that distribute the calculation over 8 processes does not lead to a nice
scaling.

But I don't understand this huge amount of latency when using the
StreamTracer in a Cave-Mode with two view ports and two pvserver processes
on the same machine (extra machine for the client). I guess the tracer
filter is applied for each viewport separately? This would be ok as long as
both filter executions run parallel. And I doubt that this is the case.

Can you help to clarify my problem?

Regards,
Stephan


Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Donnerstag, 31. Mai 2012 21:33
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

It is in the current VTK and ParaView master.  The class is
vtkPStreamTracer. 

Leo
On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Hi, Andy and Leo,

thanks for your replies.

Is it possible to get this new implementation? I would to give it a try.

Regards,
Stephan

Am 31.05.2012 um 17:48 schrieb Yuanxin Liu leo@kitware.com:
Hi

Re: [Paraview] Parallel Streamtracer

2012-06-05 Thread Stephan Rogge
Thanks, Leo.

That's sounds great. I'm looking forward to have a parallel Stream Tracer
for small vector fields.

Stephan

Von: Yuanxin Liu [mailto:leo@kitware.com] 
Gesendet: Montag, 4. Juni 2012 19:31
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi, Stephan,
  I will look into the multi-core issue as well as the performance issue.
 
  Some quick answers:

  - Yes, the whole vector fields are partitioned and the streamlines are
passed from one process to another. This is why the performance can be
highly sensitive to how data are distributed and how the streamlines travel
between data partitions. 

  - Your suggestion makes sense if the data is small enough to be run on a
single machine. This is definitely something we would like to do in the
future. Right now, the implementation is more targeted towards handling
large data that have to be distributed across multiple machines.   

Leo

On Mon, Jun 4, 2012 at 5:21 AM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Hello Leo,

ok, I took the disk_out_ref.ex2 example data set and did some time
measurements. Remember, my machine has 4 Cores + HyperThreading.

My first observation is that PV seems to have a problem with distributing
the data when the Multi-Core option (GUI) is enabled. When PV is started
with builtin Multi-Core I was not able to apply a stream tracer with more
than 1000 seed points (PV is freezing and never comes back). Otherwise, when
pvserver processes has been started manually I was able to set up to 100.000
seed points. Is it a bug?

Now let's have a look on the scaling performance. As you suggested, I've
used the D3 filter for distributing the data along the processes. The stream
tracer execution time for 10.000 seed points:

##   Bulitin: 10.063 seconds
##   1 MPI-Process (no D3): 10.162 seconds
##   4 MPI-Processes: 15.615 seconds
##   8 MPI-Processes: 14.103 seconds

and 100.000 seed points:

##   Bulitin: 100.603 seconds
##   1 MPI-Process (no D3): 100.967 seconds
##   4 MPI-Processes: 168.1 seconds
##   8 MPI-Processes: 171.325 seconds

I cannot see any positive scaling behavior here. Maybe is this example not
appropriate for scaling measurements?

One more thing: I've visualized the vtkProcessId and saw that the whole
vector field is partitioned. I thought, that each streamline is integrated
in its own process. But it seems that this is not the case. This could
explain my scaling issues: In cases of small vector fields the overhead of
synchronization becomes too large and decreases the overall performance.

My suggestion is to have a parallel StreamTracer which is built for a single
machine with several threads. Could be worth to randomly distribute the
seeds over all available (local) processes? Of course, each process have
access on the whole vector field.

Cheers,
Stephan



Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Freitag, 1. Juni 2012 16:13
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi, Stephan,
  I did measure the performance at some point and was able to get fairly
decent speed up with more processors. So I am surprised you are seeing huge
latency.

  Of course, the performance is sensitive to the input.  It is also
sensitive to how readers distribute data. So, one thing you might want to
try is to attach the D3 filter to the reader.

  If that doesn't help,  I will be happy to get your data and take a look.

Leo

On Fri, Jun 1, 2012 at 1:54 AM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Leo,

As I mentioned in my initial post of this thread: I used the up-to-date
master branch of ParaView. Which means I have already used your
implementation.

I can imagine, to parallelize this algorithm can be very tough. And I can
see that distribute the calculation over 8 processes does not lead to a nice
scaling.

But I don't understand this huge amount of latency when using the
StreamTracer in a Cave-Mode with two view ports and two pvserver processes
on the same machine (extra machine for the client). I guess the tracer
filter is applied for each viewport separately? This would be ok as long as
both filter executions run parallel. And I doubt that this is the case.

Can you help to clarify my problem?

Regards,
Stephan


Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Donnerstag, 31. Mai 2012 21:33
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

It is in the current VTK and ParaView master.  The class is
vtkPStreamTracer. 

Leo
On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Hi, Andy and Leo,

thanks for your replies.

Is it possible to get this new implementation? I would to give it a try.

Regards,
Stephan

Am 31.05.2012 um 17:48 schrieb Yuanxin Liu leo@kitware.com:
Hi, Stephan,
   The previous implementation only has serial performance:  It traces the
streamlines one at a time and never

Re: [Paraview] Parallel Streamtracer

2012-06-04 Thread Stephan Rogge
Hello Leo,

ok, I took the disk_out_ref.ex2 example data set and did some time
measurements. Remember, my machine has 4 Cores + HyperThreading. 

My first observation is that PV seems to have a problem with distributing
the data when the Multi-Core option (GUI) is enabled. When PV is started
with builtin Multi-Core I was not able to apply a stream tracer with more
than 1000 seed points (PV is freezing and never comes back). Otherwise, when
pvserver processes has been started manually I was able to set up to 100.000
seed points. Is it a bug? 

Now let's have a look on the scaling performance. As you suggested, I've
used the D3 filter for distributing the data along the processes. The stream
tracer execution time for 10.000 seed points: 

##   Bulitin: 10.063 seconds
##   1 MPI-Process (no D3): 10.162 seconds
##   4 MPI-Processes: 15.615 seconds
##   8 MPI-Processes: 14.103 seconds

and 100.000 seed points:

##   Bulitin: 100.603 seconds
##   1 MPI-Process (no D3): 100.967 seconds
##   4 MPI-Processes: 168.1 seconds
##   8 MPI-Processes: 171.325 seconds

I cannot see any positive scaling behavior here. Maybe is this example not
appropriate for scaling measurements?

One more thing: I've visualized the vtkProcessId and saw that the whole
vector field is partitioned. I thought, that each streamline is integrated
in its own process. But it seems that this is not the case. This could
explain my scaling issues: In cases of small vector fields the overhead of
synchronization becomes too large and decreases the overall performance.

My suggestion is to have a parallel StreamTracer which is built for a single
machine with several threads. Could be worth to randomly distribute the
seeds over all available (local) processes? Of course, each process have
access on the whole vector field.

Cheers,
Stephan



Von: Yuanxin Liu [mailto:leo@kitware.com] 
Gesendet: Freitag, 1. Juni 2012 16:13
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

Hi, Stephan, 
  I did measure the performance at some point and was able to get fairly
decent speed up with more processors. So I am surprised you are seeing huge
latency.

  Of course, the performance is sensitive to the input.  It is also
sensitive to how readers distribute data. So, one thing you might want to
try is to attach the D3 filter to the reader.

  If that doesn't help,  I will be happy to get your data and take a look. 

Leo

On Fri, Jun 1, 2012 at 1:54 AM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Leo,

As I mentioned in my initial post of this thread: I used the up-to-date
master branch of ParaView. Which means I have already used your
implementation.

I can imagine, to parallelize this algorithm can be very tough. And I can
see that distribute the calculation over 8 processes does not lead to a nice
scaling.

But I don't understand this huge amount of latency when using the
StreamTracer in a Cave-Mode with two view ports and two pvserver processes
on the same machine (extra machine for the client). I guess the tracer
filter is applied for each viewport separately? This would be ok as long as
both filter executions run parallel. And I doubt that this is the case.

Can you help to clarify my problem?

Regards,
Stephan


Von: Yuanxin Liu [mailto:leo@kitware.com]
Gesendet: Donnerstag, 31. Mai 2012 21:33
An: Stephan Rogge
Cc: Andy Bauer; paraview@paraview.org
Betreff: Re: [Paraview] Parallel Streamtracer

It is in the current VTK and ParaView master.  The class is
vtkPStreamTracer. 

Leo
On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Hi, Andy and Leo,

thanks for your replies.

Is it possible to get this new implementation? I would to give it a try.

Regards,
Stephan

Am 31.05.2012 um 17:48 schrieb Yuanxin Liu leo@kitware.com:
Hi, Stephan,
   The previous implementation only has serial performance:  It traces the
streamlines one at a time and never starts a new streamline until the
previous one finishes.  With communication overhead, it is not surprising it
got slower.

  My new implementation is able to let the processes working on different
streamlines simultaneously and should scale much better.

Leo

On Thu, May 31, 2012 at 11:27 AM, Andy Bauer andy.ba...@kitware.com wrote:
Hi Stephan,

The parallel stream tracer uses the partitioning of the grid to determine
which process does the integration. When the streamline exits the subdomain
of a process there is a search to see if it enters a subdomain assigned to
any other processes before figuring it whether it has left the entire
domain.

Leo, copied here, has been improving the streamline implementation inside of
VTK so you may want to get his newer version. It is a pretty tough algorithm
to parallelize efficiently without making any assumptions on the flow or
partitioning.

Andy

On Thu, May 31, 2012 at 4:16 AM, Stephan Rogge stephan.ro...@tu-cottbus.de
wrote:
Hello,

I have a question related to the parallelism

Re: [Paraview] Parallel Streamtracer

2012-06-04 Thread Yuanxin Liu
Hi, Stephan,
  I will look into the multi-core issue as well as the performance issue.

  Some quick answers:

  - Yes, the whole vector fields are partitioned and the streamlines are
passed from one process to another. This is why the performance can be
highly sensitive to how data are distributed and how the streamlines travel
between data partitions.

  - Your suggestion makes sense if the data is small enough to be run on a
single machine. This is definitely something we would like to do in the
future. Right now, the implementation is more targeted towards handling
large data that have to be distributed across multiple machines.

Leo


On Mon, Jun 4, 2012 at 5:21 AM, Stephan Rogge
stephan.ro...@tu-cottbus.dewrote:

 Hello Leo,

 ok, I took the disk_out_ref.ex2 example data set and did some time
 measurements. Remember, my machine has 4 Cores + HyperThreading.

 My first observation is that PV seems to have a problem with distributing
 the data when the Multi-Core option (GUI) is enabled. When PV is started
 with builtin Multi-Core I was not able to apply a stream tracer with more
 than 1000 seed points (PV is freezing and never comes back). Otherwise,
 when
 pvserver processes has been started manually I was able to set up to
 100.000
 seed points. Is it a bug?

 Now let's have a look on the scaling performance. As you suggested, I've
 used the D3 filter for distributing the data along the processes. The
 stream
 tracer execution time for 10.000 seed points:

 ##   Bulitin: 10.063 seconds
 ##   1 MPI-Process (no D3): 10.162 seconds
 ##   4 MPI-Processes: 15.615 seconds
 ##   8 MPI-Processes: 14.103 seconds

 and 100.000 seed points:

 ##   Bulitin: 100.603 seconds
 ##   1 MPI-Process (no D3): 100.967 seconds
 ##   4 MPI-Processes: 168.1 seconds
 ##   8 MPI-Processes: 171.325 seconds

 I cannot see any positive scaling behavior here. Maybe is this example not
 appropriate for scaling measurements?

 One more thing: I've visualized the vtkProcessId and saw that the whole
 vector field is partitioned. I thought, that each streamline is integrated
 in its own process. But it seems that this is not the case. This could
 explain my scaling issues: In cases of small vector fields the overhead of
 synchronization becomes too large and decreases the overall performance.

 My suggestion is to have a parallel StreamTracer which is built for a
 single
 machine with several threads. Could be worth to randomly distribute the
 seeds over all available (local) processes? Of course, each process have
 access on the whole vector field.

 Cheers,
 Stephan



 Von: Yuanxin Liu [mailto:leo@kitware.com]
 Gesendet: Freitag, 1. Juni 2012 16:13
 An: Stephan Rogge
 Cc: Andy Bauer; paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 Hi, Stephan,
   I did measure the performance at some point and was able to get fairly
 decent speed up with more processors. So I am surprised you are seeing huge
 latency.

   Of course, the performance is sensitive to the input.  It is also
 sensitive to how readers distribute data. So, one thing you might want to
 try is to attach the D3 filter to the reader.

   If that doesn't help,  I will be happy to get your data and take a look.

 Leo

 On Fri, Jun 1, 2012 at 1:54 AM, Stephan Rogge stephan.ro...@tu-cottbus.de
 
 wrote:
 Leo,

 As I mentioned in my initial post of this thread: I used the up-to-date
 master branch of ParaView. Which means I have already used your
 implementation.

 I can imagine, to parallelize this algorithm can be very tough. And I can
 see that distribute the calculation over 8 processes does not lead to a
 nice
 scaling.

 But I don't understand this huge amount of latency when using the
 StreamTracer in a Cave-Mode with two view ports and two pvserver processes
 on the same machine (extra machine for the client). I guess the tracer
 filter is applied for each viewport separately? This would be ok as long as
 both filter executions run parallel. And I doubt that this is the case.

 Can you help to clarify my problem?

 Regards,
 Stephan


 Von: Yuanxin Liu [mailto:leo@kitware.com]
 Gesendet: Donnerstag, 31. Mai 2012 21:33
 An: Stephan Rogge
 Cc: Andy Bauer; paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 It is in the current VTK and ParaView master.  The class is
 vtkPStreamTracer.

 Leo
 On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge 
 stephan.ro...@tu-cottbus.de
 wrote:
 Hi, Andy and Leo,

 thanks for your replies.

 Is it possible to get this new implementation? I would to give it a try.

 Regards,
 Stephan

 Am 31.05.2012 um 17:48 schrieb Yuanxin Liu leo@kitware.com:
 Hi, Stephan,
The previous implementation only has serial performance:  It traces the
 streamlines one at a time and never starts a new streamline until the
 previous one finishes.  With communication overhead, it is not surprising
 it
 got slower.

   My new implementation is able to let the processes working on different
 streamlines

Re: [Paraview] Parallel Streamtracer

2012-06-01 Thread Yuanxin Liu
Hi, Stephan,
  I did measure the performance at some point and was able to get fairly
decent speed up with more processors. So I am surprised you are seeing huge
latency.

  Of course, the performance is sensitive to the input.  It is also
sensitive to how readers distribute data. So, one thing you might want to
try is to attach the D3 filter to the reader.

  If that doesn't help,  I will be happy to get your data and take a look.

Leo


On Fri, Jun 1, 2012 at 1:54 AM, Stephan Rogge
stephan.ro...@tu-cottbus.dewrote:

 Leo,

 As I mentioned in my initial post of this thread: I used the up-to-date
 master branch of ParaView. Which means I have already used your
 implementation.

 I can imagine, to parallelize this algorithm can be very tough. And I can
 see that distribute the calculation over 8 processes does not lead to a
 nice
 scaling.

 But I don't understand this huge amount of latency when using the
 StreamTracer in a Cave-Mode with two view ports and two pvserver processes
 on the same machine (extra machine for the client). I guess the tracer
 filter is applied for each viewport separately? This would be ok as long as
 both filter executions run parallel. And I doubt that this is the case.

 Can you help to clarify my problem?

 Regards,
 Stephan


 Von: Yuanxin Liu [mailto:leo@kitware.com]
 Gesendet: Donnerstag, 31. Mai 2012 21:33
 An: Stephan Rogge
 Cc: Andy Bauer; paraview@paraview.org
 Betreff: Re: [Paraview] Parallel Streamtracer

 It is in the current VTK and ParaView master.  The class is
 vtkPStreamTracer.

 Leo
 On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge 
 stephan.ro...@tu-cottbus.de
 wrote:
 Hi, Andy and Leo,

 thanks for your replies.

 Is it possible to get this new implementation? I would to give it a try.

 Regards,
 Stephan

 Am 31.05.2012 um 17:48 schrieb Yuanxin Liu leo@kitware.com:
 Hi, Stephan,
The previous implementation only has serial performance:  It traces the
 streamlines one at a time and never starts a new streamline until the
 previous one finishes.  With communication overhead, it is not surprising
 it
 got slower.

   My new implementation is able to let the processes working on different
 streamlines simultaneously and should scale much better.

 Leo

 On Thu, May 31, 2012 at 11:27 AM, Andy Bauer andy.ba...@kitware.com
 wrote:
 Hi Stephan,

 The parallel stream tracer uses the partitioning of the grid to determine
 which process does the integration. When the streamline exits the subdomain
 of a process there is a search to see if it enters a subdomain assigned to
 any other processes before figuring it whether it has left the entire
 domain.

 Leo, copied here, has been improving the streamline implementation inside
 of
 VTK so you may want to get his newer version. It is a pretty tough
 algorithm
 to parallelize efficiently without making any assumptions on the flow or
 partitioning.

 Andy

 On Thu, May 31, 2012 at 4:16 AM, Stephan Rogge 
 stephan.ro...@tu-cottbus.de
 wrote:
 Hello,

 I have a question related to the parallelism of the stream tracer: As I
 understand the code right, each line integration (trace) is processed in an
 own MPI process. Right?

 To test the scalability of the Stream tracer I've load a structured
 (curvilinear) grid and applied the filter with a Seed resolution of 1500
 and
 check the timings in a single and multi-thread (Multi Core enabled in PV
 GUI) situation.

 I was really surprised that multi core slows done the execution time to 4
 seconds. The single core takes only 1.2 seconds. Data migration cannot be
 the explanation for that behavior (0.5 seconds). What is the problem here?

 Please see attached some statistics...

 Data:
 * Structured (Curvilinear) Grid
 * 244030 Cells
 * 37 MB Memory

 System:
 * Intel i7-2600K (4 Cores + HT = 8 Threads)
 * 16 GB Ram
 * Windows 7 64 Bit
 * ParaView (master-branch, 64 bit compilation)

 #
 Single Thread (Seed resolution 1500):
 #

 Local Process
 Still Render,  0.014 seconds
 RenderView::Update,  1.222 seconds
vtkPVView::Update,  1.222 seconds
Execute vtkStreamTracer id: 2184,  1.214 seconds
 Still Render,  0.015 seconds

 #
 Eight Threads (Seed resolution 1500):
 #

 Local Process
 Still Render,  0.029 seconds
 RenderView::Update,  4.134 seconds
 vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
FullRes Data Migration,  0.619 seconds
 Still Render,  0.042 seconds
OpenGL Dev Render,  0.01 seconds


 Render Server, Process 0
 RenderView::Update,  4.134 seconds
vtkPVView::Update,  4.132 seconds
Execute vtkStreamTracer id: 2193,  3.941 seconds
 FullRes Data Migration,  0.567 seconds
Dataserver gathering to 0,  0.318 seconds
Dataserver sending to client,  0.243 seconds

 Render Server, Process 1
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Render Server, Process 2
 Execute vtkStreamTracer id: 2193

[Paraview] Parallel Streamtracer

2012-05-31 Thread Stephan Rogge
Hello,

I have a question related to the parallelism of the stream tracer: As I
understand the code right, each line integration (trace) is processed in an
own MPI process. Right? 

To test the scalability of the Stream tracer I've load a structured
(curvilinear) grid and applied the filter with a Seed resolution of 1500 and
check the timings in a single and multi-thread (Multi Core enabled in PV
GUI) situation. 

I was really surprised that multi core slows done the execution time to 4
seconds. The single core takes only 1.2 seconds. Data migration cannot be
the explanation for that behavior (0.5 seconds). What is the problem here?

Please see attached some statistics...

Data:
* Structured (Curvilinear) Grid
* 244030 Cells
* 37 MB Memory

System:
* Intel i7-2600K (4 Cores + HT = 8 Threads)
* 16 GB Ram
* Windows 7 64 Bit
* ParaView (master-branch, 64 bit compilation)

#
Single Thread (Seed resolution 1500):
#

Local Process
Still Render,  0.014 seconds
RenderView::Update,  1.222 seconds
vtkPVView::Update,  1.222 seconds
Execute vtkStreamTracer id: 2184,  1.214 seconds
Still Render,  0.015 seconds

#
Eight Threads (Seed resolution 1500):
#

Local Process
Still Render,  0.029 seconds
RenderView::Update,  4.134 seconds
vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
FullRes Data Migration,  0.619 seconds
Still Render,  0.042 seconds
OpenGL Dev Render,  0.01 seconds


Render Server, Process 0
RenderView::Update,  4.134 seconds
vtkPVView::Update,  4.132 seconds
Execute vtkStreamTracer id: 2193,  3.941 seconds
FullRes Data Migration,  0.567 seconds
Dataserver gathering to 0,  0.318 seconds
Dataserver sending to client,  0.243 seconds

Render Server, Process 1
Execute vtkStreamTracer id: 2193,  3.939 seconds

Render Server, Process 2
Execute vtkStreamTracer id: 2193,  3.938 seconds

Render Server, Process 3
Execute vtkStreamTracer id: 2193,  4.12 seconds

Render Server, Process 4
Execute vtkStreamTracer id: 2193,  3.938 seconds

Render Server, Process 5
Execute vtkStreamTracer id: 2193,  3.939 seconds

Render Server, Process 6
Execute vtkStreamTracer id: 2193,  3.938 seconds

Render Server, Process 7
Execute vtkStreamTracer id: 2193,  3.939 seconds

Cheers,
Stephan


___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


Re: [Paraview] Parallel Streamtracer

2012-05-31 Thread Andy Bauer
Hi Stephan,

The parallel stream tracer uses the partitioning of the grid to determine
which process does the integration. When the streamline exits the subdomain
of a process there is a search to see if it enters a subdomain assigned to
any other processes before figuring it whether it has left the entire
domain.

Leo, copied here, has been improving the streamline implementation inside
of VTK so you may want to get his newer version. It is a pretty tough
algorithm to parallelize efficiently without making any assumptions on the
flow or partitioning.

Andy

On Thu, May 31, 2012 at 4:16 AM, Stephan Rogge
stephan.ro...@tu-cottbus.dewrote:

 Hello,

 I have a question related to the parallelism of the stream tracer: As I
 understand the code right, each line integration (trace) is processed in an
 own MPI process. Right?

 To test the scalability of the Stream tracer I've load a structured
 (curvilinear) grid and applied the filter with a Seed resolution of 1500
 and
 check the timings in a single and multi-thread (Multi Core enabled in PV
 GUI) situation.

 I was really surprised that multi core slows done the execution time to 4
 seconds. The single core takes only 1.2 seconds. Data migration cannot be
 the explanation for that behavior (0.5 seconds). What is the problem here?

 Please see attached some statistics...

 Data:
 * Structured (Curvilinear) Grid
 * 244030 Cells
 * 37 MB Memory

 System:
 * Intel i7-2600K (4 Cores + HT = 8 Threads)
 * 16 GB Ram
 * Windows 7 64 Bit
 * ParaView (master-branch, 64 bit compilation)

 #
 Single Thread (Seed resolution 1500):
 #

 Local Process
 Still Render,  0.014 seconds
 RenderView::Update,  1.222 seconds
vtkPVView::Update,  1.222 seconds
Execute vtkStreamTracer id: 2184,  1.214 seconds
 Still Render,  0.015 seconds

 #
 Eight Threads (Seed resolution 1500):
 #

 Local Process
 Still Render,  0.029 seconds
 RenderView::Update,  4.134 seconds
 vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
FullRes Data Migration,  0.619 seconds
 Still Render,  0.042 seconds
OpenGL Dev Render,  0.01 seconds


 Render Server, Process 0
 RenderView::Update,  4.134 seconds
vtkPVView::Update,  4.132 seconds
Execute vtkStreamTracer id: 2193,  3.941 seconds
 FullRes Data Migration,  0.567 seconds
Dataserver gathering to 0,  0.318 seconds
Dataserver sending to client,  0.243 seconds

 Render Server, Process 1
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Render Server, Process 2
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 3
 Execute vtkStreamTracer id: 2193,  4.12 seconds

 Render Server, Process 4
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 5
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Render Server, Process 6
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 7
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Cheers,
 Stephan


 ___
 Powered by www.kitware.com

 Visit other Kitware open-source projects at
 http://www.kitware.com/opensource/opensource.html

 Please keep messages on-topic and check the ParaView Wiki at:
 http://paraview.org/Wiki/ParaView

 Follow this link to subscribe/unsubscribe:
 http://www.paraview.org/mailman/listinfo/paraview

___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


Re: [Paraview] Parallel Streamtracer

2012-05-31 Thread Yuanxin Liu
Hi, Stephan,
   The previous implementation only has serial performance:  It traces the
streamlines one at a time and never starts a new streamline until the
previous one finishes.  With communication overhead, it is not surprising
it got slower.

  My new implementation is able to let the processes working on different
streamlines simultaneously and should scale much better.

Leo


On Thu, May 31, 2012 at 11:27 AM, Andy Bauer andy.ba...@kitware.com wrote:

 Hi Stephan,

 The parallel stream tracer uses the partitioning of the grid to determine
 which process does the integration. When the streamline exits the subdomain
 of a process there is a search to see if it enters a subdomain assigned to
 any other processes before figuring it whether it has left the entire
 domain.

 Leo, copied here, has been improving the streamline implementation inside
 of VTK so you may want to get his newer version. It is a pretty tough
 algorithm to parallelize efficiently without making any assumptions on the
 flow or partitioning.

 Andy


 On Thu, May 31, 2012 at 4:16 AM, Stephan Rogge 
 stephan.ro...@tu-cottbus.de wrote:

 Hello,

 I have a question related to the parallelism of the stream tracer: As I
 understand the code right, each line integration (trace) is processed in
 an
 own MPI process. Right?

 To test the scalability of the Stream tracer I've load a structured
 (curvilinear) grid and applied the filter with a Seed resolution of 1500
 and
 check the timings in a single and multi-thread (Multi Core enabled in PV
 GUI) situation.

 I was really surprised that multi core slows done the execution time to 4
 seconds. The single core takes only 1.2 seconds. Data migration cannot be
 the explanation for that behavior (0.5 seconds). What is the problem here?

 Please see attached some statistics...

 Data:
 * Structured (Curvilinear) Grid
 * 244030 Cells
 * 37 MB Memory

 System:
 * Intel i7-2600K (4 Cores + HT = 8 Threads)
 * 16 GB Ram
 * Windows 7 64 Bit
 * ParaView (master-branch, 64 bit compilation)

 #
 Single Thread (Seed resolution 1500):
 #

 Local Process
 Still Render,  0.014 seconds
 RenderView::Update,  1.222 seconds
vtkPVView::Update,  1.222 seconds
Execute vtkStreamTracer id: 2184,  1.214 seconds
 Still Render,  0.015 seconds

 #
 Eight Threads (Seed resolution 1500):
 #

 Local Process
 Still Render,  0.029 seconds
 RenderView::Update,  4.134 seconds
 vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
FullRes Data Migration,  0.619 seconds
 Still Render,  0.042 seconds
OpenGL Dev Render,  0.01 seconds


 Render Server, Process 0
 RenderView::Update,  4.134 seconds
vtkPVView::Update,  4.132 seconds
Execute vtkStreamTracer id: 2193,  3.941 seconds
 FullRes Data Migration,  0.567 seconds
Dataserver gathering to 0,  0.318 seconds
Dataserver sending to client,  0.243 seconds

 Render Server, Process 1
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Render Server, Process 2
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 3
 Execute vtkStreamTracer id: 2193,  4.12 seconds

 Render Server, Process 4
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 5
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Render Server, Process 6
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 7
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Cheers,
 Stephan


 ___
 Powered by www.kitware.com

 Visit other Kitware open-source projects at
 http://www.kitware.com/opensource/opensource.html

 Please keep messages on-topic and check the ParaView Wiki at:
 http://paraview.org/Wiki/ParaView

 Follow this link to subscribe/unsubscribe:
 http://www.paraview.org/mailman/listinfo/paraview



___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


Re: [Paraview] Parallel Streamtracer

2012-05-31 Thread Stephan Rogge
Hi, Andy and Leo,

thanks for your replies.

Is it possible to get this new implementation? I would to give it a try.

Regards,
Stephan

Am 31.05.2012 um 17:48 schrieb Yuanxin Liu leo@kitware.com:

 Hi, Stephan,
The previous implementation only has serial performance:  It traces the 
 streamlines one at a time and never starts a new streamline until the 
 previous one finishes.  With communication overhead, it is not surprising it 
 got slower.
 
   My new implementation is able to let the processes working on different 
 streamlines simultaneously and should scale much better.
 
 Leo
 
 
 On Thu, May 31, 2012 at 11:27 AM, Andy Bauer andy.ba...@kitware.com wrote:
 Hi Stephan,
 
 The parallel stream tracer uses the partitioning of the grid to determine 
 which process does the integration. When the streamline exits the subdomain 
 of a process there is a search to see if it enters a subdomain assigned to 
 any other processes before figuring it whether it has left the entire domain. 
 
 Leo, copied here, has been improving the streamline implementation inside of 
 VTK so you may want to get his newer version. It is a pretty tough algorithm 
 to parallelize efficiently without making any assumptions on the flow or 
 partitioning.
 
 Andy
 
 
 On Thu, May 31, 2012 at 4:16 AM, Stephan Rogge stephan.ro...@tu-cottbus.de 
 wrote:
 Hello,
 
 I have a question related to the parallelism of the stream tracer: As I
 understand the code right, each line integration (trace) is processed in an
 own MPI process. Right?
 
 To test the scalability of the Stream tracer I've load a structured
 (curvilinear) grid and applied the filter with a Seed resolution of 1500 and
 check the timings in a single and multi-thread (Multi Core enabled in PV
 GUI) situation.
 
 I was really surprised that multi core slows done the execution time to 4
 seconds. The single core takes only 1.2 seconds. Data migration cannot be
 the explanation for that behavior (0.5 seconds). What is the problem here?
 
 Please see attached some statistics...
 
 Data:
 * Structured (Curvilinear) Grid
 * 244030 Cells
 * 37 MB Memory
 
 System:
 * Intel i7-2600K (4 Cores + HT = 8 Threads)
 * 16 GB Ram
 * Windows 7 64 Bit
 * ParaView (master-branch, 64 bit compilation)
 
 #
 Single Thread (Seed resolution 1500):
 #
 
 Local Process
 Still Render,  0.014 seconds
 RenderView::Update,  1.222 seconds
vtkPVView::Update,  1.222 seconds
Execute vtkStreamTracer id: 2184,  1.214 seconds
 Still Render,  0.015 seconds
 
 #
 Eight Threads (Seed resolution 1500):
 #
 
 Local Process
 Still Render,  0.029 seconds
 RenderView::Update,  4.134 seconds
 vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
FullRes Data Migration,  0.619 seconds
 Still Render,  0.042 seconds
OpenGL Dev Render,  0.01 seconds
 
 
 Render Server, Process 0
 RenderView::Update,  4.134 seconds
vtkPVView::Update,  4.132 seconds
Execute vtkStreamTracer id: 2193,  3.941 seconds
 FullRes Data Migration,  0.567 seconds
Dataserver gathering to 0,  0.318 seconds
Dataserver sending to client,  0.243 seconds
 
 Render Server, Process 1
 Execute vtkStreamTracer id: 2193,  3.939 seconds
 
 Render Server, Process 2
 Execute vtkStreamTracer id: 2193,  3.938 seconds
 
 Render Server, Process 3
 Execute vtkStreamTracer id: 2193,  4.12 seconds
 
 Render Server, Process 4
 Execute vtkStreamTracer id: 2193,  3.938 seconds
 
 Render Server, Process 5
 Execute vtkStreamTracer id: 2193,  3.939 seconds
 
 Render Server, Process 6
 Execute vtkStreamTracer id: 2193,  3.938 seconds
 
 Render Server, Process 7
 Execute vtkStreamTracer id: 2193,  3.939 seconds
 
 Cheers,
 Stephan
 
 
 ___
 Powered by www.kitware.com
 
 Visit other Kitware open-source projects at 
 http://www.kitware.com/opensource/opensource.html
 
 Please keep messages on-topic and check the ParaView Wiki at: 
 http://paraview.org/Wiki/ParaView
 
 Follow this link to subscribe/unsubscribe:
 http://www.paraview.org/mailman/listinfo/paraview
 
 
___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


Re: [Paraview] Parallel Streamtracer

2012-05-31 Thread Yuanxin Liu
It is in the current VTK and ParaView master.  The class is
vtkPStreamTracer.

Leo

On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge
stephan.ro...@tu-cottbus.dewrote:

 Hi, Andy and Leo,

 thanks for your replies.

 Is it possible to get this new implementation? I would to give it a try.

 Regards,
 Stephan

 Am 31.05.2012 um 17:48 schrieb Yuanxin Liu leo@kitware.com:

 Hi, Stephan,
The previous implementation only has serial performance:  It traces the
 streamlines one at a time and never starts a new streamline until the
 previous one finishes.  With communication overhead, it is not surprising
 it got slower.

   My new implementation is able to let the processes working on different
 streamlines simultaneously and should scale much better.

 Leo


 On Thu, May 31, 2012 at 11:27 AM, Andy Bauer andy.ba...@kitware.comwrote:

 Hi Stephan,

 The parallel stream tracer uses the partitioning of the grid to determine
 which process does the integration. When the streamline exits the subdomain
 of a process there is a search to see if it enters a subdomain assigned to
 any other processes before figuring it whether it has left the entire
 domain.

 Leo, copied here, has been improving the streamline implementation inside
 of VTK so you may want to get his newer version. It is a pretty tough
 algorithm to parallelize efficiently without making any assumptions on the
 flow or partitioning.

 Andy


 On Thu, May 31, 2012 at 4:16 AM, Stephan Rogge 
 stephan.ro...@tu-cottbus.de wrote:

 Hello,

 I have a question related to the parallelism of the stream tracer: As I
 understand the code right, each line integration (trace) is processed in
 an
 own MPI process. Right?

 To test the scalability of the Stream tracer I've load a structured
 (curvilinear) grid and applied the filter with a Seed resolution of 1500
 and
 check the timings in a single and multi-thread (Multi Core enabled in PV
 GUI) situation.

 I was really surprised that multi core slows done the execution time to 4
 seconds. The single core takes only 1.2 seconds. Data migration cannot be
 the explanation for that behavior (0.5 seconds). What is the problem
 here?

 Please see attached some statistics...

 Data:
 * Structured (Curvilinear) Grid
 * 244030 Cells
 * 37 MB Memory

 System:
 * Intel i7-2600K (4 Cores + HT = 8 Threads)
 * 16 GB Ram
 * Windows 7 64 Bit
 * ParaView (master-branch, 64 bit compilation)

 #
 Single Thread (Seed resolution 1500):
 #

 Local Process
 Still Render,  0.014 seconds
 RenderView::Update,  1.222 seconds
vtkPVView::Update,  1.222 seconds
Execute vtkStreamTracer id: 2184,  1.214 seconds
 Still Render,  0.015 seconds

 #
 Eight Threads (Seed resolution 1500):
 #

 Local Process
 Still Render,  0.029 seconds
 RenderView::Update,  4.134 seconds
 vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds
FullRes Data Migration,  0.619 seconds
 Still Render,  0.042 seconds
OpenGL Dev Render,  0.01 seconds


 Render Server, Process 0
 RenderView::Update,  4.134 seconds
vtkPVView::Update,  4.132 seconds
Execute vtkStreamTracer id: 2193,  3.941 seconds
 FullRes Data Migration,  0.567 seconds
Dataserver gathering to 0,  0.318 seconds
Dataserver sending to client,  0.243 seconds

 Render Server, Process 1
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Render Server, Process 2
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 3
 Execute vtkStreamTracer id: 2193,  4.12 seconds

 Render Server, Process 4
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 5
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Render Server, Process 6
 Execute vtkStreamTracer id: 2193,  3.938 seconds

 Render Server, Process 7
 Execute vtkStreamTracer id: 2193,  3.939 seconds

 Cheers,
 Stephan


 ___
 Powered by www.kitware.com

 Visit other Kitware open-source projects at
 http://www.kitware.com/opensource/opensource.html

 Please keep messages on-topic and check the ParaView Wiki at:
 http://paraview.org/Wiki/ParaView

 Follow this link to subscribe/unsubscribe:
 http://www.paraview.org/mailman/listinfo/paraview




___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview