Re: [gmx-users] Gromacs GPU got hang

2015-11-03 Thread M Teguh Satria
Hi Szilárd, Forgot to update you. We tried to recompile the Gromacs GPU with GCC compiler instead of Intel compiler. Now, everything looks fine. So, it seems the root cause was a bug/compatibility issue between Intel compiler and NVCC or other components in Gromacs. Hopefully this experience is u

Re: [gmx-users] Gromacs GPU got hang

2015-10-13 Thread Szilárd Páll
Hi Teguh, Unfortunately, I can't see anything out of the ordinary in these outputs and, admittedly, the library trace is what I was hoping to tell the most. I can't exclude the possibility if this being a bug - either in GROMACS or in one of the runtimes used. To test this and have a chance of tr

Re: [gmx-users] Gromacs GPU got hang

2015-10-12 Thread M Teguh Satria
Hi Szilárd, I tried to use strace to one of the MPI ranks. Below are the outputs. There are some timed out in OpenMP thread, but I have no idea what is the root cause. Is it kind of bug in Gromacs, or maybe in MPI / OpenMP ? Could you see what's the root cause ? FYI, we use Intel compiler v15.0.2

Re: [gmx-users] Gromacs GPU got hang

2015-10-01 Thread Szilárd Páll
The only way to know more is to either attach a debugger to the hanging process or possibly with an ltrace/strace to see in which library or syscalls is the process hanging. I suggest you try attaching a debugger and getting a stack trace (see https://sourceware.org/gdb/onlinedocs/gdb/Attach.html)

Re: [gmx-users] Gromacs GPU got hang

2015-09-30 Thread M Teguh Satria
Hi Stéphane, Thanks for your reply. Actually everything is fine if we run shorter gromacs gpu job. Only when we run longer gromacs gpu job (requires 20+ hours running) we got this problem. I recorded nvidia-smi every 10 minutes. From these records, I doubt if temperature was the cause. Before d

Re: [gmx-users] Gromacs GPU got hang

2015-09-30 Thread Téletchéa Stéphane
Le 29/09/2015 23:40, M Teguh Satria a écrit : Any of you experiencing similar problem ? Is there any way to troubleshoot/debug to see the cause ? Because I didn't get any warning or error message. Hello, This can be a driver issue (or hardware, think of temperature, dust, ...), and happens to

[gmx-users] Gromacs GPU got hang

2015-09-29 Thread M Teguh Satria
Hi All, I am new in using Gromacs. I have a small HPC cluster and one of the nodes has a Tesla K40 GPU. Now I have problem when trying to run a Gromacs GPU job in that GPU node. It seems my job got hang after several hours running, the gromacs log was stopped being updated and through nvidia-smi I

[gmx-users] Gromacs GPU got hang

2015-09-28 Thread M Teguh Satria
Hi All, I am new in using Gromacs. I have a small HPC cluster and one of the nodes has a Tesla K40 GPU. Now I have problem when trying to run a Gromacs GPU job in that GPU node. It seems my job got hang after several hours running, the gromacs log was stopped being updated and through nvidia-smi I