Re: [gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

2019-12-05 Thread Szilárd Páll
Can you please file an issue on redmine.gromacs.org and attach the inputs that reproduce the behavior described? -- Szilárd On Wed, Dec 4, 2019, 21:35 Chenou Zhang wrote: > We did test that. > Our cluster has total 11 GPU nodes and I ran 20 tests over all of them. 7 > out of the 20 tests did

Re: [gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

2019-12-04 Thread Chenou Zhang
We did test that. Our cluster has total 11 GPU nodes and I ran 20 tests over all of them. 7 out of the 20 tests did have the potential energy jump issue and they were running on 5 different nodes. So I tend to believe this issue happens on any of those nodes. On Wed, Dec 4, 2019 at 1:14 PM

Re: [gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

2019-12-04 Thread Szilárd Páll
The fact that you are observing errors alo the energies to be off by so much and that it reproduces with multiple inputs suggest that this may not a code issue. Did you do all runs that failed on the same hardware? Have you excluded the option that one of those GeForce cards may be flaky? --

Re: [gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

2019-12-04 Thread Chenou Zhang
We tried the same gmx settings in 2019.4 with different protein systems. And we got the same weird potential energy jump within 1000 steps. ``` Step Time 00.0 Energies (kJ/mol) BondU-BProper Dih. Improper Dih. CMAP Dih.

Re: [gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

2019-12-03 Thread Chenou Zhang
Hi, I've run 30 tests with the -notunepme option. I got the following error from one of them(which is still the same *cudaStreamSynchronize failed* error): ``` DD step 1422999 vol min/aver 0.639 load imb.: force 1.1% pme mesh/force 1.079 Step Time 1423000

Re: [gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

2019-12-02 Thread Chenou Zhang
For the error: ``` ^Mstep 4400: timed with pme grid 96 96 60, coulomb cutoff 1.446: 467.9 M-cycles ^Mstep 4600: timed with pme grid 96 96 64, coulomb cutoff 1.372: 451.4 M-cycles /var/spool/slurmd/job2321134/slurm_script: line 44: 29866 Segmentation fault gmx mdrun -v -s $TPR -deffnm

Re: [gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

2019-12-02 Thread Mark Abraham
Hi, What driver version is reported in the respective log files? Does the error persist if mdrun -notunepme is used? Mark On Mon., 2 Dec. 2019, 21:18 Chenou Zhang, wrote: > Hi Gromacs developers, > > I'm currently running gromacs 2019.4 on our university's HPC cluster. To > fully utilize the