On May 26, 2012, at 6:32 PM, Jed Brown wrote: > That load imbalance often comes from whatever came *before* the reduction. > > Yes I assumed that was understood.
> On May 26, 2012 5:25 PM, "Mark F. Adams" <mark.adams at columbia.edu> wrote: > Just a few comments on the perf data. I see a lot (~5x) load imbalance on > reduction stuff like VecMDot and VecNorm (Time Ratio, max/min) but even a lot > of imbalance on simple non-collective vector ops like VecAXPY (~3x). So if > the work is load balanced well then I would suspect its a NUMA issue. > > Mark > > On May 26, 2012, at 6:13 PM, Aron Roland wrote: > >> Dear All, >> >> I have some question on some recent implementation of PETSc for solving a >> large linear system from a 4d problem on hybrid unstructured meshes. >> >> The point is that we have implemented all the mappings and the solution is >> fine, the number of iterations too. The results are robust with respect to >> the amount of CPU used but we have a scaling issue. >> >> The system is an intel cluster of the latest generation on Infiniband. >> >> We have attached the summary ... with hooefully a lot of informations. >> >> Any comments, suggestions, ideas are very welcome. >> >> We have been reading the threads with that are dealing with multi-core and >> the bus-limitation stuff, so we are aware of this. >> >> I am thinking now on an open/mpi hybrid stuff but I am not quite happy with >> the bus-limitation explanation, most of the systems are multicore. >> >> I hope the limitation are not the sparse matrix mapping that we are using >> ... >> >> Thanks in advance ... >> >> Cheers >> >> Aron >> >> >> >> >> >> <benchmark.txt> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120526/1f4192f0/attachment.html>
