Re: [gmx-users] maxh mdrun option does not work with REMD simulation

2016-04-05 Thread Mark Abraham
Hi,

It should work, but apparently doesn't. Please open an issue at
redmine.gromacs.org and include a tarball of your tpr files so we can see
the problem happen and fix it.

Meanwhile, I suggest the approach of using gmx grompp to construct a run
that will complete in the maximum time you can run, and to use gmx tpbconv
to extend suitably for the next phase.

Mark

On Tue, 5 Apr 2016 16:51 Maud Jusot  wrote:

> Hello again
>
> I still don’t manage to restart correctly REMD simulation (see my
> previous message) but I can add some details. When it restarts, gromacs
> doesn’t create new checkpoint files (no matter –cpt option is), and
> doesn’t stop at maxh time. I tried with 3 different versions of gromacs
> (4.6.5, 5.1.0 and 5.1.2) on two different clusters, so I am quite sure
> the problem does not come from the installation nor from the version.
>
> It’s a big issue for me because I try to run 2.4 micro seconds
> simulation and on the cluster I use I can’t do simulation for more than
> 24h (which correspond to 300 ns approximately) without doing restart. So
> without checkpoint file I am unable to re-launch my simulation.
>
> Is there something wrong in what I do ?
> Does any body have an idea or do you think it's a bug and I should write
> to the developers mailing list ?
>
> Thanks,
>
> Maud
>
> Le 29/03/2016 11:33, Maud Jusot a écrit :
> > Dear Gromacs users,
> >
> > I tried to do a REMD simulation with gromacs 5.1 which is re-launched
> > every hour (in a queuing system) with the -maxh option.
> > The first time it was launched, it worked : the run stoped at the maxh
> > time and it was re-launched with the checkpoint files and continued
> > the simulation. But during this second run, when the maxh time was
> > achieved (step 1981062), gromacs said that it was going to stop but it
> > did not stop until the system kill the job (step 2545600) .
> >
> > I tried with different maxh times ( 1/0.95/0.20 hour) to be sure that
> > the time between maxh and the cluster maxtime was sufficient, but in
> > any case the second run continued until it reached the one hour and
> > was killed by the system.
> >
> > I find this very strange that it works the first time and that the
> > second time gromacs says that it has to stop but does not.
> > Moreover, I tried the same work but with a classical simulation
> > (without REMD) and this time there was no problem.
> > Did I forget an option or something like that for maxh being
> > compatible with the REMD ?
> > I searched on the web and the mailing list but I did not find any
> > recording problems between maxh and REMD.
> >
> > Do you have any idea of what the problem is ?
> >
> > Here is the command lines in my script myJob.slurm :
> > -
> > srun --mpi=pmi2 -K1 --resv-ports -n $SLURM_NTASKS mdrun_mpi -ntomp 1
> > -multi 8 -replex 500 -maxh 0.2 -deffnm mdA_ -cpi mdA_.cpt -cpo
> > mdA_.cpt -v 2>> remdA.log
> > # resubmit the same job at the end for a long run:
> > sbatch myJob.slurm
> > -
> >
> > Here is a part of my remdA.log file :
> > -
> > starting mdrun 'myPeptide'
> > starting mdrun 'myPeptide'
> > 12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
> > 12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
> >
> > Step 1981061: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > step 1981100, will finish Sat Mar 26 11:14:40 2016
> > step 1981200, will finish Sat Mar 26 11:14:40 2016
> > ...
> > step 2545600, will finish Sat Mar 26 11:15:49 2016srun: Job step
> > aborted: Waiting up to 32 seconds for job step to finish.
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, 

Re: [gmx-users] maxh mdrun option does not work with REMD simulation

2016-04-05 Thread Maud Jusot

Hello again

I still don’t manage to restart correctly REMD simulation (see my 
previous message) but I can add some details. When it restarts, gromacs 
doesn’t create new checkpoint files (no matter –cpt option is), and 
doesn’t stop at maxh time. I tried with 3 different versions of gromacs 
(4.6.5, 5.1.0 and 5.1.2) on two different clusters, so I am quite sure 
the problem does not come from the installation nor from the version.


It’s a big issue for me because I try to run 2.4 micro seconds 
simulation and on the cluster I use I can’t do simulation for more than 
24h (which correspond to 300 ns approximately) without doing restart. So 
without checkpoint file I am unable to re-launch my simulation.


Is there something wrong in what I do ?
Does any body have an idea or do you think it's a bug and I should write 
to the developers mailing list ?


Thanks,

Maud

Le 29/03/2016 11:33, Maud Jusot a écrit :

Dear Gromacs users,

I tried to do a REMD simulation with gromacs 5.1 which is re-launched 
every hour (in a queuing system) with the -maxh option.
The first time it was launched, it worked : the run stoped at the maxh 
time and it was re-launched with the checkpoint files and continued 
the simulation. But during this second run, when the maxh time was 
achieved (step 1981062), gromacs said that it was going to stop but it 
did not stop until the system kill the job (step 2545600) .


I tried with different maxh times ( 1/0.95/0.20 hour) to be sure that 
the time between maxh and the cluster maxtime was sufficient, but in 
any case the second run continued until it reached the one hour and 
was killed by the system.


I find this very strange that it works the first time and that the 
second time gromacs says that it has to stop but does not.
Moreover, I tried the same work but with a classical simulation 
(without REMD) and this time there was no problem.
Did I forget an option or something like that for maxh being 
compatible with the REMD ?
I searched on the web and the mailing list but I did not find any 
recording problems between maxh and REMD.


Do you have any idea of what the problem is ?

Here is the command lines in my script myJob.slurm :
-
srun --mpi=pmi2 -K1 --resv-ports -n $SLURM_NTASKS mdrun_mpi -ntomp 1 
-multi 8 -replex 500 -maxh 0.2 -deffnm mdA_ -cpi mdA_.cpt -cpo 
mdA_.cpt -v 2>> remdA.log

# resubmit the same job at the end for a long run:
sbatch myJob.slurm
-

Here is a part of my remdA.log file :
-
starting mdrun 'myPeptide'
starting mdrun 'myPeptide'
12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
starting mdrun 'myPeptide'
12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
starting mdrun 'myPeptide'
12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
starting mdrun 'myPeptide'
12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
starting mdrun 'myPeptide'
12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
starting mdrun 'myPeptide'
12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
starting mdrun 'myPeptide'
12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).
12000 steps, 24.0 ps (continuing from step 655701, 1311.4 ps).

Step 1981061: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

step 1981100, will finish Sat Mar 26 11:14:40 2016
step 1981200, will finish Sat Mar 26 11:14:40 2016
...
step 2545600, will finish Sat Mar 26 11:15:49 2016srun: Job step 
aborted: Waiting up to 32 seconds for job step to finish.


Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step
-

Thanks a lot,

Maud







--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.