Hi John,
Thanks for the insight! We have image and rectilinear grids (current
size is 512^3 not that big, but these are growing as we get more cpu
hours), we use seed points from a plane with a higher than grid
resolution, which intersects a number of sub domains,. There is some
potential to integrate in parallel. At this point I am not sure it will
help (and after reading your and others comments less so). There is a
big serial component to the algorithm, and load is imbalanced.
Great that you found some ways to boost the performance! Any speed up
will be very helpful in this application.
Burlen
John Biddiscombe wrote:
Burlen
I have had performance issues with the Distributed Stream tracer, but
in fact I found that in general, the problem of it not being very well
optimized for parallel operation was not the main trouble. If you are
using Unstructured Grids, and they are large (in my case 20million
cells in a block), then the main time was taken by the building of
cell links which are used to FindCEll inwhich an integration point
lies. I modified the stream tracer interpolation to use a BSP tree (or
CellLocator) and found a huge improvement in execution time. (minutes
instead of hours).
Secondly. the parallelization of the stream tracer is an inherent
problem. One cannot integrate the streamline in block 2, until it has
reached a boundary in block 1 - one must wait until the streamling
traverses one block before passing it to the next. In actuality, the
implementation could be improved with more intelligent seeding and
rending/receiving of streamline seeds etc between iterations.
The Particle tracer code could be modifed to produce streamlines in a
serial or distributed manner and ought to give a 'reasonably' optimal
solution to the problem - but in fact the chaps at kitware are at the
moment (they tell me) in the process of revamping the streamline code
to make use of CellLocators - and for this reason I recently committed
my BSP tree code.
Here's how to check your bottleneck.
Find a large StructuredGrid dataset which is loaded in parallel.
Generate streamlines. Time it. Convert the grdi to UnstructuredGrid
and do the same. If test 1 takes 1 minute and test 2 1 hour, then it
isn't the parallization that's the real issue, but the grid being used.
JB
We've been using the distributed stream tracer to generate 100s-1000s
of stream lines per time step. It's very slow, and it doesn't scale
at all. The class comments say as much. I'm sure there is a reason
why this implementation was chosen. Is there something that generally
prevents real parallel implementation? Is there a better
implementation available out there?
There is this post a while back
http://www.paraview.org/pipermail/paraview/2009-July/012959.html
What's the status?
Thanks
Burlen
_______________________________________________
Powered by www.kitware.com
Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html
Please keep messages on-topic and check the ParaView Wiki at:
http://paraview.org/Wiki/ParaView
Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview
_______________________________________________
Powered by www.kitware.com
Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html
Please keep messages on-topic and check the ParaView Wiki at:
http://paraview.org/Wiki/ParaView
Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview