is the file I/O that you mentioned using MPI I/O for that? If yes, what
file system are you writing to?
Edgar
On 4/5/2018 10:15 AM, Noam Bernstein wrote:
On Apr 5, 2018, at 11:03 AM, Reuti <re...@staff.uni-marburg.de> wrote:
Hi,
Am 05.04.2018 um 16:16 schrieb Noam Bernstein <noam.bernst...@nrl.navy.mil>:
Hi all - I have a code that uses MPI (vasp), and it’s hanging in a strange way.
Basically, there’s a Cartesian communicator, 4x16 (64 processes total), and
despite the fact that the communication pattern is rather regular, one
particular send/recv pair hangs consistently. Basically, across each row of 4,
task 0 receives from 1,2,3, and tasks 1,2,3 send to 0. On most of the 16 such
sets all those send/recv pairs complete. However, on 2 of them, it hangs (both
the send and recv). I have stack traces (with gdb -p on the running processes)
from what I believe are corresponding send/recv pairs.
<snip>
This is with OpenMPI 3.0.1 (same for 3.0.0, haven’t checked older versions),
Intel compilers (17.2.174). It seems to be independent of which nodes, always
happens on this pair of calls and happens after the code has been running for a
while, and the same code for the other 14 sets of 4 work fine, suggesting that
it’s an MPI issue, rather than an obvious bug in this code or a hardware
problem. Does anyone have any ideas, either about possible causes or how to
debug things further?
Do you use scaLAPACK, and which type of BLAS/LAPACK? I used Intel MKL with the
Intel compilers for VASP and found, that using in addition a self-compiled
scaLAPACK is working fine in combination with Open MPI. Using Intel scaLAPACK
and Intel MPI is also working fine. What I never got working was the
combination Intel scaLAPACK and Open MPI – at one point one process got a
message from a wrong rank IIRC. I tried both: the Intel supplied Open MPI
version of scaLAPACK and also compiling the necessary interface on my own for
Open MPI in $MKLROOT/interfaces/mklmpi with identical results.
MKL BLAS/LAPACK, with my own self-compiled scalapack, but in this run I set
LSCALAPCK=.FALSE. I suppose I could try compiling without it just to test. In
any case, this is when it’s writing out the wavefunctions, which I would assume
be unrelated to scalapack operations (unless they’re corrupting some low level
MPI thing, I guess).
Noam
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users