Just a few comments on the perf data. I see a lot (~5x) load imbalance on reduction stuff like VecMDot and VecNorm (Time Ratio, max/min) but even a lot of imbalance on simple non-collective vector ops like VecAXPY (~3x). So if the work is load balanced well then I would suspect its a NUMA issue.
Mark On May 26, 2012, at 6:13 PM, Aron Roland wrote: > Dear All, > > I have some question on some recent implementation of PETSc for solving a > large linear system from a 4d problem on hybrid unstructured meshes. > > The point is that we have implemented all the mappings and the solution is > fine, the number of iterations too. The results are robust with respect to > the amount of CPU used but we have a scaling issue. > > The system is an intel cluster of the latest generation on Infiniband. > > We have attached the summary ... with hooefully a lot of informations. > > Any comments, suggestions, ideas are very welcome. > > We have been reading the threads with that are dealing with multi-core and > the bus-limitation stuff, so we are aware of this. > > I am thinking now on an open/mpi hybrid stuff but I am not quite happy with > the bus-limitation explanation, most of the systems are multicore. > > I hope the limitation are not the sparse matrix mapping that we are using ... > > Thanks in advance ... > > Cheers > > Aron > > > > > > <benchmark.txt> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120526/3a82de0e/attachment.html>
