Re: [petsc-dev] new P^1.5 algorithm in VecAssembleBegin?

Jed Brown Fri, 29 May 2015 12:49:06 -0700

Barry Smith <[email protected]> writes:

>> On May 29, 2015, at 2:29 PM, Jed Brown <[email protected]> wrote:
>> 
>> Barry Smith <[email protected]> writes:
>> 
>>>  I cannot explain why the load balance would be 1.0 unless, by
>>>  unlikely coincidence on the 248 different calls to the function
>>>  different processes are the ones waiting so that the sum of the
>>>  waits on different processes matches over the 248 calls. Possible
>>>  but
>> 
>> Uh, it's the same reason VecNorm often shows significant load imbalance.
>
>    Uh, I don't understand. It shows NO imbalance but huge
>    times. Normally I would expect a large imbalance and huge times. So
>    I cannot explain why it has no imbalance. 1.0 means no imbalance.


Sorry, I mixed two comments.  There are two non-scalable operations,
determining ownership for outgoing entries (the loop I showed) and the
huge MPI_Allreduce.  Our timers can never observe load imbalance in the
ownership determination because all that work is done after the timer
has started and the timer can't end until after the MPI_Allreduce.  The
MPI_Allreduce is really expensive because it involves 1-2 MB from each
of 128k cores.  (MPI_Reduce_scatter_block is much better.)

If incoming load imbalance is small relative to the ownership
determination plus MPI_Allreduce, then we see 1.0.  Putting a barrier
before (sort of) guarantees that, but if it was already the case, the
barrier won't change anything.

signature.asc
Description: PGP signature

Re: [petsc-dev] new P^1.5 algorithm in VecAssembleBegin?

Reply via email to