What Ralph said. You just blow memory on a queue that is not recovered in the
current implementation.
Also, moving to Allreduce will resolve the issue as now every call is
effectively also a barrier. I have found with some benchmarks and collective
implementations it can be faster than reduce
Thank you, Nathan. This makes more sense now.
On Tue, Apr 16, 2019 at 6:48 AM Nathan Hjelm wrote:
> What Ralph said. You just blow memory on a queue that is not recovered in
> the current implementation.
>
> Also, moving to Allreduce will resolve the issue as now every call is
> effectively
After installing UCX 1.5.0 and OpenMPI 4.0.1 compiled for UCX and
without verbs
(full details below), my NetPIPE benchmark is reporting message failures for
some message sizes above 300 KB. There are no failures when I benchmark
with
a non-UCX (verbs) version of OpenMPI 4.0.1, and no failures