Just a few comments on the perf data.  I see a lot (~5x) load imbalance on 
reduction stuff like VecMDot and VecNorm (Time Ratio, max/min) but even a lot 
of imbalance on simple non-collective vector ops like VecAXPY (~3x).  So if the 
work is load balanced well then I would suspect its a NUMA issue.

Mark

On May 26, 2012, at 6:13 PM, Aron Roland wrote:

> Dear All,
> 
> I have some question on some recent implementation of PETSc for solving a 
> large linear system from a 4d problem on hybrid unstructured meshes. 
> 
> The point is that we have implemented all the mappings and the solution is 
> fine, the number of iterations too. The results are robust with respect to 
> the amount of CPU used but we have a scaling issue. 
> 
> The system is an intel cluster of the latest generation on Infiniband.
> 
> We have attached the summary ... with hooefully a lot of informations. 
> 
> Any comments, suggestions, ideas are very welcome. 
> 
> We have been reading the threads with that are dealing with multi-core and 
> the bus-limitation stuff, so we are aware of this. 
> 
> I am thinking now on an open/mpi hybrid stuff but I am not quite happy with 
> the bus-limitation explanation, most of the systems are multicore. 
> 
> I hope the limitation are not the sparse matrix mapping that we are using ... 
> 
> Thanks in advance ...
> 
> Cheers
> 
> Aron 
> 
> 
> 
> 
> 
> <benchmark.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120526/3a82de0e/attachment.html>

Reply via email to