Fande Kong <[email protected]> writes: > On Fri, Oct 7, 2016 at 10:30 PM, Jed Brown <[email protected]> wrote: > >> Barry Smith <[email protected]> writes: >> > There is still something wonky here, whether it is the MPI >> implementation or how PETSc handles the assembly. Without any values that >> need to be communicated it is unacceptably that these calls take so long. >> If we understood __exactly__ why the performance suddenly drops so >> dramatically we could perhaps fix it. I do not understand why. >> >> I guess it's worth timing. If they don't have MPI_Reduce_scatter_block >> then it falls back to a big MPI_Allreduce. After that, it's all >> point-to-point messaging that shouldn't suck and there actually >> shouldn't be anything to send or receive anyway. The BTS implementation >> should be much smarter and literally reduces to a barrier in this case. >> > > Hi Jed, > > How to use the BTS implementation for Vec. For mat, we may just use > "-matstash_bts"?
Should be -vec_assembly_bts
