On Fri, Oct 7, 2016 at 10:30 PM, Jed Brown <[email protected]> wrote:
> Barry Smith <[email protected]> writes: > > There is still something wonky here, whether it is the MPI > implementation or how PETSc handles the assembly. Without any values that > need to be communicated it is unacceptably that these calls take so long. > If we understood __exactly__ why the performance suddenly drops so > dramatically we could perhaps fix it. I do not understand why. > > I guess it's worth timing. If they don't have MPI_Reduce_scatter_block > then it falls back to a big MPI_Allreduce. After that, it's all > point-to-point messaging that shouldn't suck and there actually > shouldn't be anything to send or receive anyway. The BTS implementation > should be much smarter and literally reduces to a barrier in this case. > Hi Jed, How to use the BTS implementation for Vec. For mat, we may just use "-matstash_bts"? Fande,
