[gmx-users] intermittent changes in energy drift following simulation restarts in v4.6.1

2013-09-09 Thread Richard Broadbent

Dear All,

I've been analysing a series of long (200 ns) NVE simulations  (md 
integrator) on ~93'000 atom systems I ran the simulations in groups of 3 
using the -multi option in gromacs v4.6.1 double precision.


Simulations were run with 1 OpenMP thread per MPI process

The simulations were restarted at regular intervals using the following 
submission script:



FILE=4.6_P84_DIO_

module load fftw xe-gromacs/4.6.1

# Change to the direcotry that the job was submitted from
cd $PBS_O_WORKDIR

export NPROC=`qstat -f $PBS_JOBID | grep mppwidth | awk '{print $3}'`
export NTASK=`qstat -f $PBS_JOBID | grep mppnppn  | awk '{print $3}'`

aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3 
-npme 64 -append -cpi




###

The first simulation was run with the same script except the mdrun line was

aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3 
-npme 64


###


The simulations generally ran and restarted without trouble, however, in 
several simulations the energy drift changed radically following the 
restart.


in one simulation the simulation ran for 50 ns (including one restart) 
with a drift of -141.6 +/- 0.1 kJ mol^-1 ns^1

it was restarted then had a drift of +104 +/- 1 kJ mol^-1 ns^1 for 15 ns
then was restarted and continued with a drift of -138 +/- 0.1 kJ mol^-1 
ns^1 for a further 50~ns.


The other 2 simulations running in parallel with this calculation 
through the -multi option did not experience a change in gradient.


the drifts were calculated by least squares analysis of the output from 
the total energy data given by


echo total | g_energy_d -f ${FILE}${i}.edr -o total_${FILE}${i}.xvg 
-xvg none



The simulation writes to the edr every 20 ps and the transition is 
masked by the expected oscillations in energy due to the integrator on a 
2~ns interval but the change in drift is clear when looking at a 4~ns 
range centred on the restart.


The hardware used was of the same specification for all jobs (27 cray 
XE6 nodes (9 nodes per simulation), 32 mpi processes per node)


The simulations use the verlet cut-off scheme
there are H-bond constraints enforced using lincs (order 6, iterations 2)


I can't think what would cause this change in the drift during a 
restart. However, I have seen it in simulations run on both an AMD 
system (cray XE6, AVX-FMA) and an intel system  (SGI-ice, SSE4.1).



I have some data generated using the same procedure using v4.5.5 and 
v4.5.7 (different cut-off scheme) and the restarts in that system have 
not caused any appreciable changes in the simulation.


Unfortunately I didn't save the checkpoint files used for the restart (I 
will in the future). I'm going to try building a new input file from 
just before the restart using the trr trajectory data.



Does anyone have any ideas of what might have caused this?

Has anyone seen similar effects?

Thanks,

Richard
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] intermittent changes in energy drift following simulation restarts in v4.6.1

2013-09-09 Thread Mark Abraham
Sounds worrying :-( Thanks for the detailed report and
trouble-shooting! So far, I can't think of a reason for it.

A couple of suggestions:
* try again with 4.6.3 (at least while trouble-shooting) in case its a fixed bug
* post a representative .mdp file
* is there anything out of the ordinary in the topology?
* if the problem is restart-related and shows up in the drift quickly,
then you can probably find a reproducible case via a job that does
lots of short-interval restarts and saves all the intermediate files -
a (set of) inputs that can reproduce the problem sounds like what we'd
need to diagnose and/or fix anything
* does it happen in a non-multi simulation? (or more particularly,
what are you doing with -multi?)
* check .log files for warnings, and that there are none being
suppressed at the grompp stage
* see if the group cut-off scheme in 4.6.x shows the same problem

Mark


On Mon, Sep 9, 2013 at 4:08 PM, Richard Broadbent
richard.broadben...@imperial.ac.uk wrote:
 Dear All,

 I've been analysing a series of long (200 ns) NVE simulations  (md
 integrator) on ~93'000 atom systems I ran the simulations in groups of 3
 using the -multi option in gromacs v4.6.1 double precision.

 Simulations were run with 1 OpenMP thread per MPI process

 The simulations were restarted at regular intervals using the following
 submission script:


 FILE=4.6_P84_DIO_

 module load fftw xe-gromacs/4.6.1

 # Change to the direcotry that the job was submitted from
 cd $PBS_O_WORKDIR

 export NPROC=`qstat -f $PBS_JOBID | grep mppwidth | awk '{print $3}'`
 export NTASK=`qstat -f $PBS_JOBID | grep mppnppn  | awk '{print $3}'`

 aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
 -npme 64 -append -cpi



 ###

 The first simulation was run with the same script except the mdrun line was

 aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
 -npme 64

 ###


 The simulations generally ran and restarted without trouble, however, in
 several simulations the energy drift changed radically following the
 restart.

 in one simulation the simulation ran for 50 ns (including one restart) with
 a drift of -141.6 +/- 0.1 kJ mol^-1 ns^1
 it was restarted then had a drift of +104 +/- 1 kJ mol^-1 ns^1 for 15 ns
 then was restarted and continued with a drift of -138 +/- 0.1 kJ mol^-1 ns^1
 for a further 50~ns.

 The other 2 simulations running in parallel with this calculation through
 the -multi option did not experience a change in gradient.

 the drifts were calculated by least squares analysis of the output from the
 total energy data given by

 echo total | g_energy_d -f ${FILE}${i}.edr -o total_${FILE}${i}.xvg -xvg
 none


 The simulation writes to the edr every 20 ps and the transition is masked by
 the expected oscillations in energy due to the integrator on a 2~ns interval
 but the change in drift is clear when looking at a 4~ns range centred on the
 restart.

 The hardware used was of the same specification for all jobs (27 cray XE6
 nodes (9 nodes per simulation), 32 mpi processes per node)

 The simulations use the verlet cut-off scheme
 there are H-bond constraints enforced using lincs (order 6, iterations 2)


 I can't think what would cause this change in the drift during a restart.
 However, I have seen it in simulations run on both an AMD system (cray XE6,
 AVX-FMA) and an intel system  (SGI-ice, SSE4.1).


 I have some data generated using the same procedure using v4.5.5 and v4.5.7
 (different cut-off scheme) and the restarts in that system have not caused
 any appreciable changes in the simulation.

 Unfortunately I didn't save the checkpoint files used for the restart (I
 will in the future). I'm going to try building a new input file from just
 before the restart using the trr trajectory data.


 Does anyone have any ideas of what might have caused this?

 Has anyone seen similar effects?

 Thanks,

 Richard
 --
 gmx-users mailing listgmx-users@gromacs.org
 http://lists.gromacs.org/mailman/listinfo/gmx-users
 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to gmx-users-requ...@gromacs.org.
 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] intermittent changes in energy drift following simulation restarts in v4.6.1

2013-09-09 Thread Richard Broadbent

Hi Mark,

Thanks for the quick response,

On 09/09/13 15:45, Mark Abraham wrote:

Sounds worrying :-( Thanks for the detailed report and
trouble-shooting! So far, I can't think of a reason for it.

A couple of suggestions:
* try again with 4.6.3 (at least while trouble-shooting) in case its a fixed bug
I'll test that side by side with 4.6.1 that way we can have both for 
comparison

* post a representative .mdp file

its below this message the production run is built using tpbconv -extend
on the .tpr built from that .mdp.


* is there anything out of the ordinary in the topology?
I built the residues myself but they're just standard polymer monomer 
units nothing out of the ordinary.



* if the problem is restart-related and shows up in the drift quickly,
then you can probably find a reproducible case via a job that does
lots of short-interval restarts and saves all the intermediate files -
a (set of) inputs that can reproduce the problem sounds like what we'd
need to diagnose and/or fix anything

I'm already starting to build them will be testing them tomorrow

* does it happen in a non-multi simulation? (or more particularly,
what are you doing with -multi?)
The -multi was used to move the job into a faster queue I've seen it in 
non -multi jobs

* check .log files for warnings, and that there are none being
suppressed at the grompp stage

There are no errors at grompp stage
 I haven't identified any warnings in the mdrun logs but I'm going to 
have a another look before I'm 100% certain that there aren't any in 
there but I couldn't see any on a first look through



* see if the group cut-off scheme in 4.6.x shows the same problem


Will do


Mark



Thanks,

Richard


integrator = md
bd_fric = 0

dt = 0.002

nsteps = 250

comm_mode = linear

nstxout = 10
nstvout = 10
nstfout = 0

xtc_grps = P84
nstxtcout = 5

nstlog = 10

nstenergy = 5

pbc = xyz
periodic_molecules = no

ns_type = grid
nstlist = 10

rlist = 1.25
optimize_fft = yes
fourier_nx = 128
fourier_ny = 128
fourier_nz = 128

pme_order   = 4
epsilon_r   = 1.0

coulombtype = pme
coulomb-modifier = Potential-shift-Verlet
rcoulomb = 1.2

vdwtype = cut-off
vdw-modifier = Potential-shift-Verlet

rvdw = 1.20

DispCorr = EnerPres

tcoupl = no

nsttcouple = 5

pcoupl = no

constraints = h-bonds

lincs_order = 6
lincs_iter = 2

cutoff-scheme = Verlet
verlet-buffer-drift = -1





On Mon, Sep 9, 2013 at 4:08 PM, Richard Broadbent
richard.broadben...@imperial.ac.uk wrote:

Dear All,

I've been analysing a series of long (200 ns) NVE simulations  (md
integrator) on ~93'000 atom systems I ran the simulations in groups of 3
using the -multi option in gromacs v4.6.1 double precision.

Simulations were run with 1 OpenMP thread per MPI process

The simulations were restarted at regular intervals using the following
submission script:


FILE=4.6_P84_DIO_

module load fftw xe-gromacs/4.6.1

# Change to the direcotry that the job was submitted from
cd $PBS_O_WORKDIR

export NPROC=`qstat -f $PBS_JOBID | grep mppwidth | awk '{print $3}'`
export NTASK=`qstat -f $PBS_JOBID | grep mppnppn  | awk '{print $3}'`

aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
-npme 64 -append -cpi



###

The first simulation was run with the same script except the mdrun line was

aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
-npme 64

###


The simulations generally ran and restarted without trouble, however, in
several simulations the energy drift changed radically following the
restart.

in one simulation the simulation ran for 50 ns (including one restart) with
a drift of -141.6 +/- 0.1 kJ mol^-1 ns^1
it was restarted then had a drift of +104 +/- 1 kJ mol^-1 ns^1 for 15 ns
then was restarted and continued with a drift of -138 +/- 0.1 kJ mol^-1 ns^1
for a further 50~ns.

The other 2 simulations running in parallel with this calculation through
the -multi option did not experience a change in gradient.

the drifts were calculated by least squares analysis of the output from the
total energy data given by

echo total | g_energy_d -f ${FILE}${i}.edr -o total_${FILE}${i}.xvg -xvg
none


The simulation writes to the edr every 20 ps and the transition is masked by
the expected oscillations in energy due to the integrator on a 2~ns interval
but the change in drift is clear when looking at a 4~ns range centred on the
restart.

The hardware used was of the same specification for all jobs (27 cray XE6
nodes (9 nodes per simulation), 32 mpi processes per node)

The simulations use the verlet cut-off scheme
there are H-bond constraints enforced using lincs (order 6, iterations 2)


I can't think what would cause this change in the drift during a restart.
However, I have seen it in simulations run on both an AMD system (cray XE6,
AVX-FMA) and an intel system  (SGI-ice, SSE4.1).


I have some data generated using the same procedure using v4.5.5 and v4.5.7
(different cut-off scheme) and 

Re: [gmx-users] intermittent changes in energy drift following simulation restarts in v4.6.1

2013-09-09 Thread Mark Abraham
No obvious problems. Please open an issue at redmine.gromacs.org when you
have something reproducible, but don't hurry, nobody's likely to have time
to check it out for a week or two.

Cheers,

Mark
On Sep 9, 2013 5:11 PM, Richard Broadbent 
richard.broadben...@imperial.ac.uk wrote:

 Hi Mark,

 Thanks for the quick response,

 On 09/09/13 15:45, Mark Abraham wrote:

 Sounds worrying :-( Thanks for the detailed report and
 trouble-shooting! So far, I can't think of a reason for it.

 A couple of suggestions:
 * try again with 4.6.3 (at least while trouble-shooting) in case its a
 fixed bug

 I'll test that side by side with 4.6.1 that way we can have both for
 comparison

 * post a representative .mdp file

 its below this message the production run is built using tpbconv -extend
 on the .tpr built from that .mdp.

  * is there anything out of the ordinary in the topology?

 I built the residues myself but they're just standard polymer monomer
 units nothing out of the ordinary.

  * if the problem is restart-related and shows up in the drift quickly,
 then you can probably find a reproducible case via a job that does
 lots of short-interval restarts and saves all the intermediate files -
 a (set of) inputs that can reproduce the problem sounds like what we'd
 need to diagnose and/or fix anything

 I'm already starting to build them will be testing them tomorrow

 * does it happen in a non-multi simulation? (or more particularly,
 what are you doing with -multi?)

 The -multi was used to move the job into a faster queue I've seen it in
 non -multi jobs

 * check .log files for warnings, and that there are none being
 suppressed at the grompp stage

 There are no errors at grompp stage
  I haven't identified any warnings in the mdrun logs but I'm going to have
 a another look before I'm 100% certain that there aren't any in there but I
 couldn't see any on a first look through

  * see if the group cut-off scheme in 4.6.x shows the same problem

  Will do

  Mark



 Thanks,

 Richard


 integrator = md
 bd_fric = 0

 dt = 0.002

 nsteps = 250

 comm_mode = linear

 nstxout = 10
 nstvout = 10
 nstfout = 0

 xtc_grps = P84
 nstxtcout = 5

 nstlog = 10

 nstenergy = 5

 pbc = xyz
 periodic_molecules = no

 ns_type = grid
 nstlist = 10

 rlist = 1.25
 optimize_fft = yes
 fourier_nx = 128
 fourier_ny = 128
 fourier_nz = 128

 pme_order   = 4
 epsilon_r   = 1.0

 coulombtype = pme
 coulomb-modifier = Potential-shift-Verlet
 rcoulomb = 1.2

 vdwtype = cut-off
 vdw-modifier = Potential-shift-Verlet

 rvdw = 1.20

 DispCorr = EnerPres

 tcoupl = no

 nsttcouple = 5

 pcoupl = no

 constraints = h-bonds

 lincs_order = 6
 lincs_iter = 2

 cutoff-scheme = Verlet
 verlet-buffer-drift = -1




 On Mon, Sep 9, 2013 at 4:08 PM, Richard Broadbent
 richard.broadbent09@imperial.**ac.ukrichard.broadben...@imperial.ac.uk
 wrote:

 Dear All,

 I've been analysing a series of long (200 ns) NVE simulations  (md
 integrator) on ~93'000 atom systems I ran the simulations in groups of 3
 using the -multi option in gromacs v4.6.1 double precision.

 Simulations were run with 1 OpenMP thread per MPI process

 The simulations were restarted at regular intervals using the following
 submission script:


 FILE=4.6_P84_DIO_

 module load fftw xe-gromacs/4.6.1

 # Change to the direcotry that the job was submitted from
 cd $PBS_O_WORKDIR

 export NPROC=`qstat -f $PBS_JOBID | grep mppwidth | awk '{print $3}'`
 export NTASK=`qstat -f $PBS_JOBID | grep mppnppn  | awk '{print $3}'`

 aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
 -npme 64 -append -cpi



 ###

 The first simulation was run with the same script except the mdrun line
 was

 aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
 -npme 64

 ###


 The simulations generally ran and restarted without trouble, however, in
 several simulations the energy drift changed radically following the
 restart.

 in one simulation the simulation ran for 50 ns (including one restart)
 with
 a drift of -141.6 +/- 0.1 kJ mol^-1 ns^1
 it was restarted then had a drift of +104 +/- 1 kJ mol^-1 ns^1 for 15 ns
 then was restarted and continued with a drift of -138 +/- 0.1 kJ mol^-1
 ns^1
 for a further 50~ns.

 The other 2 simulations running in parallel with this calculation through
 the -multi option did not experience a change in gradient.

 the drifts were calculated by least squares analysis of the output from
 the
 total energy data given by

 echo total | g_energy_d -f ${FILE}${i}.edr -o total_${FILE}${i}.xvg
 -xvg
 none


 The simulation writes to the edr every 20 ps and the transition is
 masked by
 the expected oscillations in energy due to the integrator on a 2~ns
 interval
 but the change in drift is clear when looking at a 4~ns range centred on
 the
 restart.

 The hardware used was of the same specification for all jobs (27 cray XE6
 nodes (9 nodes per simulation), 32 mpi processes