Re: [gmx-users] What is the most reliable way to run repeats for reproducibility?

2018-01-11 Thread Justin Lemkul



On 1/10/18 9:59 AM, ZHANG Cheng wrote:

Hi Mark,
Thank you very much.


) For the link you provide, I think I could not manipulate most of the computer 
resources, as I submit my jobs to our cluster, and the jobs are distributed to 
different available cores randomly.


) For "random seed" of velocity, I found here and I enabled this option:
gen_vel = yes
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/06_equil.html


So does it mean that it is better to use the same em.tpr and run different 
NVT,NPT,etc. for different repeats, so as to initialise it with different 
velocities?


This is common practice. Minimize the system once (there's usually no 
variation here, the coordinates always move downhill on the potential 
energy gradient) and then initiate however many simulations you want 
with different starting velocities (requiring different gen_seed 
values). This generates independent simulations.




) How the "natural chaotic divergence during equilibration" is reflected at 
which step?


The link says: "The Central Limit Theorem tells us that in the case of infinitely long simulation all 
observables converge to their equilibrium values". But I think this "equilibrium" is not 
practical for protein in MD. For example, if I am running a protein at 370K, ultimately it will unfold, like 
boiling an egg in water, it takes 10 min. But in MD, the time scale is way more shorter, i.e. usually a few 
hundred ns scale. We could "never" see the proteins converges within that short period.


So my understanding about "equilibrium" is the equilibration for 
temperature/pressure/density, but not the protein itself. Is that correct?


Yes, quantities like temperature and pressure converge relatively 
quickly, but the dynamics of the system tend to take much longer, orders 
of magnitude.


-Justin

--
==

Justin A. Lemkul, Ph.D.
Assistant Professor
Virginia Tech Department of Biochemistry

303 Engel Hall
340 West Campus Dr.
Blacksburg, VA 24061

jalem...@vt.edu | (540) 231-3129
http://www.biochem.vt.edu/people/faculty/JustinLemkul.html

==

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] What is the most reliable way to run repeats for reproducibility?

2018-01-10 Thread ZHANG Cheng
Hi Mark,
Thank you very much.


) For the link you provide, I think I could not manipulate most of the computer 
resources, as I submit my jobs to our cluster, and the jobs are distributed to 
different available cores randomly.


) For "random seed" of velocity, I found here and I enabled this option:
gen_vel = yes
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/06_equil.html


So does it mean that it is better to use the same em.tpr and run different 
NVT,NPT,etc. for different repeats, so as to initialise it with different 
velocities?


) How the "natural chaotic divergence during equilibration" is reflected at 
which step?


The link says: "The Central Limit Theorem tells us that in the case of 
infinitely long simulation all observables converge to their equilibrium 
values". But I think this "equilibrium" is not practical for protein in MD. For 
example, if I am running a protein at 370K, ultimately it will unfold, like 
boiling an egg in water, it takes 10 min. But in MD, the time scale is way more 
shorter, i.e. usually a few hundred ns scale. We could "never" see the proteins 
converges within that short period.


So my understanding about "equilibrium" is the equilibration for 
temperature/pressure/density, but not the protein itself. Is that correct?
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/06_equil.html
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/07_equil2.html


Yours sincerely
Cheng





-- Original --
From:  "ZHANG Cheng";<272699...@qq.com>;
Date:  Wed, Jan 10, 2018 09:11 PM
To:  "gromacs.org_gmx-users";

Subject:  What is the most reliable way to run repeats for reproducibility?



Dear Gromacs,
I can think of different ways of running repeats, after reading Justin's 
lysozyme tutorial.


The 1st way: all starting from the same em.tpr after energy minimization (EM) 
and use em.tpr individually for subsequent steps (NVT, NPT and production MD):
) repeat 1: same em.tpr ?? NVT ?? NPT ?? md_0_1.tpr?? production MD
) repeat 2: same em.tpr ?? NVT ?? NPT ?? md_0_1.tpr?? production MD
) repeat 3: same em.tpr ?? NVT ?? NPT ?? md_0_1.tpr?? production MD
..


The 2nd way: all starting from the same md_0_1.tpr and use it for different 
production MD:
) repeat 1: same em.tpr ?? same NVT ?? same NPT ?? same md_0_1.tpr?? production 
MD
) repeat 2: same md_0_1.tpr?? production MD
) repeat 3: same md_0_1.tpr?? production MD
..



The 3rd way: all starting from the same check point file within the production 
run and use it for the rest of the production MD:
) repeat 1: same em.tpr ?? same NVT ?? same NPT ?? same md_0_1.tpr?? same 
production MD for 50 ns ?? same .cpt file ?? production MD for another 200 ns
) repeat 2: same .cpt file ?? production MD for another 200 ns
) repeat 3: same .cpt file ?? production MD for another 200 ns
..



Of course, the 3rd way is easier. But does it mean it may not cover enough 
conformations, as they tend to be more resembled from each other than the 1st 
approach? Is there a standard way to handle the repeats?


Thank you.


Yours sincerely
Cheng
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] What is the most reliable way to run repeats for reproducibility?

2018-01-10 Thread Mark Abraham
Hi,

See http://www.gromacs.org/Documentation/Terminology/Reproducibility. Some
people think they want reproducibility of a trajectory, which is generally
not needed, and not consistent with highly efficient sampling (particularly
with GPUs involved).

But what you actually want is not reproducibility. The usual approach is to
change the random seed used when you first generate velocities, and rely on
the natural chaotic divergence during equilibration to lead to independent
sampling in the replicas.

Mark

On Wed, Jan 10, 2018 at 2:12 PM ZHANG Cheng <272699...@qq.com> wrote:

> Dear Gromacs,
> I can think of different ways of running repeats, after reading Justin's
> lysozyme tutorial.
>
>
> The 1st way: all starting from the same em.tpr after energy minimization
> (EM) and use em.tpr individually for subsequent steps (NVT, NPT and
> production MD):
> ) repeat 1: same em.tpr → NVT → NPT → md_0_1.tpr→ production MD
> ) repeat 2: same em.tpr → NVT → NPT → md_0_1.tpr→ production MD
> ) repeat 3: same em.tpr → NVT → NPT → md_0_1.tpr→ production MD
> ..
>
>
> The 2nd way: all starting from the same md_0_1.tpr and use it for
> different production MD:
> ) repeat 1: same em.tpr → same NVT → same NPT → same md_0_1.tpr→
> production MD
> ) repeat 2: same md_0_1.tpr→ production MD
> ) repeat 3: same md_0_1.tpr→ production MD
> ..
>
>
>
> The 3rd way: all starting from the same check point file within the
> production run and use it for the rest of the production MD:
> ) repeat 1: same em.tpr → same NVT → same NPT → same md_0_1.tpr→ same
> production MD for 50 ns → same .cpt file → production MD for another 200 ns
> ) repeat 2: same .cpt file → production MD for another 200 ns
> ) repeat 3: same .cpt file → production MD for another 200 ns
> ..
>
>
>
> Of course, the 3rd way is easier. But does it mean it may not cover enough
> conformations, as they tend to be more resembled from each other than the
> 1st approach? Is there a standard way to handle the repeats?
>
>
> Thank you.
>
>
> Yours sincerely
> Cheng
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] What is the most reliable way to run repeats for reproducibility?

2018-01-10 Thread ZHANG Cheng
Dear Gromacs,
I can think of different ways of running repeats, after reading Justin's 
lysozyme tutorial.


The 1st way: all starting from the same em.tpr after energy minimization (EM) 
and use em.tpr individually for subsequent steps (NVT, NPT and production MD):
) repeat 1: same em.tpr ?? NVT ?? NPT ?? md_0_1.tpr?? production MD
) repeat 2: same em.tpr ?? NVT ?? NPT ?? md_0_1.tpr?? production MD
) repeat 3: same em.tpr ?? NVT ?? NPT ?? md_0_1.tpr?? production MD
..


The 2nd way: all starting from the same md_0_1.tpr and use it for different 
production MD:
) repeat 1: same em.tpr ?? same NVT ?? same NPT ?? same md_0_1.tpr?? production 
MD
) repeat 2: same md_0_1.tpr?? production MD
) repeat 3: same md_0_1.tpr?? production MD
..



The 3rd way: all starting from the same check point file within the production 
run and use it for the rest of the production MD:
) repeat 1: same em.tpr ?? same NVT ?? same NPT ?? same md_0_1.tpr?? same 
production MD for 50 ns ?? same .cpt file ?? production MD for another 200 ns
) repeat 2: same .cpt file ?? production MD for another 200 ns
) repeat 3: same .cpt file ?? production MD for another 200 ns
..



Of course, the 3rd way is easier. But does it mean it may not cover enough 
conformations, as they tend to be more resembled from each other than the 1st 
approach? Is there a standard way to handle the repeats?


Thank you.


Yours sincerely
Cheng
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.