Hi Szilárd,
Forgot to update you. We tried to recompile the Gromacs GPU with GCC
compiler instead of Intel compiler. Now, everything looks fine. So, it
seems the root cause was a bug/compatibility issue between Intel compiler
and NVCC or other components in Gromacs.
Hopefully this experience is u
Hi Teguh,
Unfortunately, I can't see anything out of the ordinary in these outputs
and, admittedly, the library trace is what I was hoping to tell the most.
I can't exclude the possibility if this being a bug - either in GROMACS or
in one of the runtimes used. To test this and have a chance of tr
Hi Szilárd,
I tried to use strace to one of the MPI ranks. Below are the outputs. There
are some timed out in OpenMP thread, but I have no idea what is the root
cause. Is it kind of bug in Gromacs, or maybe in MPI / OpenMP ? Could you
see what's the root cause ?
FYI, we use Intel compiler v15.0.2
The only way to know more is to either attach a debugger to the hanging
process or possibly with an ltrace/strace to see in which library or
syscalls is the process hanging.
I suggest you try attaching a debugger and getting a stack trace (see
https://sourceware.org/gdb/onlinedocs/gdb/Attach.html)
Hi Stéphane,
Thanks for your reply.
Actually everything is fine if we run shorter gromacs gpu job. Only when we
run longer gromacs gpu job (requires 20+ hours running) we got this problem.
I recorded nvidia-smi every 10 minutes. From these records, I doubt if
temperature was the cause.
Before d
Le 29/09/2015 23:40, M Teguh Satria a écrit :
Any of you experiencing similar problem ? Is there any way to
troubleshoot/debug to see the cause ? Because I didn't get any warning or
error message.
Hello,
This can be a driver issue (or hardware, think of temperature, dust, ...),
and happens to
Hi All,
I am new in using Gromacs. I have a small HPC cluster and one of the nodes
has a Tesla K40 GPU. Now I have problem when trying to run a Gromacs GPU
job in that GPU node. It seems my job got hang after several hours running,
the gromacs log was stopped being updated and through nvidia-smi I
Hi All,
I am new in using Gromacs. I have a small HPC cluster and one of the nodes
has a Tesla K40 GPU. Now I have problem when trying to run a Gromacs GPU
job in that GPU node. It seems my job got hang after several hours running,
the gromacs log was stopped being updated and through nvidia-smi I