On Fri, May 29, 2015 at 3:29 PM, Jed Brown <[email protected]> wrote:

> Barry Smith <[email protected]> writes:
>
> >   I cannot explain why the load balance would be 1.0 unless, by
> >   unlikely coincidence on the 248 different calls to the function
> >   different processes are the ones waiting so that the sum of the
> >   waits on different processes matches over the 248 calls. Possible
> >   but
>
> Uh, it's the same reason VecNorm often shows significant load imbalance.
>
> >> I've added a barrier in the code.
> >
> >    You don't need a barrier.  If you do not have a barrier you should
> >    see all the "wait time" now accumulate somewhere later in the code
> >    at the next reduction after the VecAssemblyBegin/End.
>
> Presumably he added a barrier *before* calling the function.  The
> function does a small amount of work (basically none because he has no
> off-process entries) and synchronizes (PetscMaxSum).  If there was load
> imbalance before calling VecAssemblyBegin, the timer would start at
> different times on each process, but end at about the same time.
>

Yea, I realized that VecAssembly should see this load imbalance unless it
had a barrier before its timer.  So I'm not sure what is going on.

Reply via email to