Re: [OMPI users] Exhausting QPs?

2018-03-13 Thread Nathan Hjelm
Yalla works because MXM defaults to using unconnected datagrams (I don’t think it uses RC unless you ask). Is this a fully connected algorithm? I ask because (3584 - 28) * 28 * 3 (default number of QPs/remote process in btl/openib) = 298704 > 262144. This is the problem with RC. Mellanox solved

[OMPI users] Exhausting QPs?

2018-03-13 Thread Ben Menadue
Hi, One of our users is having trouble scaling his code up to 3584 cores (i.e. 128 28-core nodes). It runs fine on 1792 cores (64 nodes), but fails with this at 3584: -- A process failed to create a queue pair. This usually