On Fri, May 29, 2015 at 3:29 PM, Jed Brown <[email protected]> wrote:
> Barry Smith <[email protected]> writes: > > > I cannot explain why the load balance would be 1.0 unless, by > > unlikely coincidence on the 248 different calls to the function > > different processes are the ones waiting so that the sum of the > > waits on different processes matches over the 248 calls. Possible > > but > > Uh, it's the same reason VecNorm often shows significant load imbalance. > > >> I've added a barrier in the code. > > > > You don't need a barrier. If you do not have a barrier you should > > see all the "wait time" now accumulate somewhere later in the code > > at the next reduction after the VecAssemblyBegin/End. > > Presumably he added a barrier *before* calling the function. The > function does a small amount of work (basically none because he has no > off-process entries) and synchronizes (PetscMaxSum). If there was load > imbalance before calling VecAssemblyBegin, the timer would start at > different times on each process, but end at about the same time. > Yea, I realized that VecAssembly should see this load imbalance unless it had a barrier before its timer. So I'm not sure what is going on.
