> On Jun 19, 2019, at 5:05 PM, Joshua Ladd <jladd.m...@gmail.com> wrote: > > Hi, Noam > > Can you try your original command line with the following addition: > > mpirun —mca pml ucx —mca btl ^vader,tcp,openib -mca osc ucx > > I think we're seeing some conflict between UCX PML and UCT OSC.
I did this, although meanwhile I also did a clean compile (to add some debugging statements) and switched from running on 1 node (36 cores) to 2 nodes. The problem is slightly different, but still similar. Now the memory doesn’t continue to expand until it runs out. Instead, one node (the head node?) is using 55 GB, while the other is using only 23 GB. The latter value (23 GB) is consistent with the usage from ps or top (36 * 640 MB/proc). When I kill the job, the node that used to use 55 GB goes down to 34 GB (with nothing running), and the other is down to about 1 GB. Noam ____________ || |U.S. NAVAL| |_RESEARCH_| LABORATORY Noam Bernstein, Ph.D. Center for Materials Physics and Technology U.S. Naval Research Laboratory T +1 202 404 8628 F +1 202 404 7546 https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users