[OMPI users] openmpi crashes for more than 1 MPI

2018-03-13 Thread abhisek Mondal
Hi, I'm having a strange issue with Openmpi-1.4. Whenever I try to run a program with number of mpi more than 1, it crashes. For instance the following code: mpirun -np 2 -bynode `which relion_refine_mpi` --gpu --tau2_fudge 2 --scale --dont_combine_weights_via_disc --iter 25 --norm

Re: [OMPI users] openmpi crashes for more than 1 MPI

2018-03-13 Thread Gilles Gouaillardet
Hi, I think it is really time to upgrade Open MPI. Supported versions are 2.1.2 and 3.0.0 Open MPI 1.4 is really old now and I doubt you will ever get any support on that version. Cheers, Gilles On 3/13/2018 3:58 PM, abhisek Mondal wrote: Hi, I'm having a strange issue with

Re: [OMPI users] How to Build OpenMPI to support FDR over SR-IOV

2018-03-13 Thread Jeff Squyres (jsquyres)
Pharthiphan -- No need to cross-post the same question in three places (GitHub issue, this list, and the devel list). Let's keep the thread on the devel list, where the first parts of your questions have already been answered. Thanks. > On Mar 13, 2018, at 11:30 AM, Pharthiphan Asokan

Re: [MTT users] MTT username/password and report upload

2018-03-13 Thread Jeff Squyres (jsquyres)
Yes, it's trivial to reset the Fujitsu MTT password -- I'll send you a mail off-list with the new password. If you're just starting up with MTT, you might want to use the Python client, instead. That's where 95% of ongoing development is occurring. If all goes well, I plan to sit down with

[OMPI users] Exhausting QPs?

2018-03-13 Thread Ben Menadue
Hi, One of our users is having trouble scaling his code up to 3584 cores (i.e. 128 28-core nodes). It runs fine on 1792 cores (64 nodes), but fails with this at 3584: -- A process failed to create a queue pair. This

Re: [MTT users] MTT username/password and report upload

2018-03-13 Thread Kawashima, Takahiro
Jeff, Thank you. I received the password. I cannot remember I had received it before... My colleague was working using the Perl client before but the work was suspended because his job was changed. It is the reason we use the Perl client currently. We want to change it to the Python client if

Re: [OMPI users] Exhausting QPs?

2018-03-13 Thread Nathan Hjelm
Yalla works because MXM defaults to using unconnected datagrams (I don’t think it uses RC unless you ask). Is this a fully connected algorithm? I ask because (3584 - 28) * 28 * 3 (default number of QPs/remote process in btl/openib) = 298704 > 262144. This is the problem with RC. Mellanox solved