On May 26, 2012, at 6:32 PM, Jed Brown wrote:

> That load imbalance often comes from whatever came *before* the reduction.
> 
> 
Yes I assumed that was understood.

> On May 26, 2012 5:25 PM, "Mark F. Adams" <mark.adams at columbia.edu> wrote:
> Just a few comments on the perf data.  I see a lot (~5x) load imbalance on 
> reduction stuff like VecMDot and VecNorm (Time Ratio, max/min) but even a lot 
> of imbalance on simple non-collective vector ops like VecAXPY (~3x).  So if 
> the work is load balanced well then I would suspect its a NUMA issue.
> 
> Mark
> 
> On May 26, 2012, at 6:13 PM, Aron Roland wrote:
> 
>> Dear All,
>> 
>> I have some question on some recent implementation of PETSc for solving a 
>> large linear system from a 4d problem on hybrid unstructured meshes. 
>> 
>> The point is that we have implemented all the mappings and the solution is 
>> fine, the number of iterations too. The results are robust with respect to 
>> the amount of CPU used but we have a scaling issue. 
>> 
>> The system is an intel cluster of the latest generation on Infiniband.
>> 
>> We have attached the summary ... with hooefully a lot of informations. 
>> 
>> Any comments, suggestions, ideas are very welcome. 
>> 
>> We have been reading the threads with that are dealing with multi-core and 
>> the bus-limitation stuff, so we are aware of this. 
>> 
>> I am thinking now on an open/mpi hybrid stuff but I am not quite happy with 
>> the bus-limitation explanation, most of the systems are multicore. 
>> 
>> I hope the limitation are not the sparse matrix mapping that we are using 
>> ... 
>> 
>> Thanks in advance ...
>> 
>> Cheers
>> 
>> Aron 
>> 
>> 
>> 
>> 
>> 
>> <benchmark.txt>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120526/1f4192f0/attachment.html>

Reply via email to