> On Oct 7, 2016, at 10:44 PM, Jed Brown <[email protected]> wrote:
>
> Barry Smith <[email protected]> writes:
>> VecAssemblyBegin/End() does a couple of all reduces and then message
>> passing (if values need to be moved) to get the values onto the correct
>> processes. So these calls should take very little time. Something is wonky
>> on your system with that many MPI processes, with these calls. I don't know
>> why, if you look at the code you'll see it is pretty straightforward.
>
> Those MPI calls can be pretty sucky on some networks. Dave encountered
> this years ago when they were using VecSetValues/VecAssembly rather
> heavily. I think that most performance-aware PETSc applications
> typically never tried to use VecSetValues/VecAssembly or they did not
> need to do it very often (e.g., as part of a matrix-free solver). The
> BTS implementation fixes the performance issue, but I'm still working on
> solving the corner case that has been reported. Fortunately, the
> VecAssembly is totally superfluous to this user.
Jed,
There is still something wonky here, whether it is the MPI implementation
or how PETSc handles the assembly. Without any values that need to be
communicated it is unacceptably that these calls take so long. If we understood
__exactly__ why the performance suddenly drops so dramatically we could perhaps
fix it. I do not understand why.
Barry