Re: [gmx-users] How does g_msd calculates MSD?

2010-10-18 Thread Javier Cerezo

Hello Julian.

I've not gone though the C-code neither. However, from what I have read 
and understood about MSD calculation, both calculations you proposed 
should not give the same results. This is the way it is calculated.


1. You take different starting times over tyour trajectory, every 
-trestart picoseconds, from -b (let's take 0ps in this explanation) to -e.


2. From every restarting point (tr) you calculate the MSD, by 
calculating the diplacement of atomic positions compared to the position 
at the corresponding tr.


3. With the previous result you'll have a lot of MSD curves, 
corresponding to each restating point, e.g. one where t0=0ps, other 
t0=0+trestart, and so on. Actually, you are refering all curves to its 
corresponding t0, so you have all of them starting at 0.


4. The average of all of these curves is done. With that you obtain 
properly averaged results (which is essential for a good MSD 
calculation). However, note that only the restarting point at 0ps of 
your original trayectory (or the -b time you selected) will contain all 
the points of the curve since, for example, the last restating point wil 
conntribute with the few points remaining to the end of the simulation 
(-e time). For that reason, the average is made with less points as the 
time from the reference is increased, and at the end of the curve, the 
results are not well averaged any more.


Hope that helps.

Javier

El 15/10/10 23:58, Julian escribió:

Hi everybody,

We're trying to understand how gromacs g_msd calculates MSD (without 
reading through the C code, I really don't know much about it, and 
there are hundreds of lines).
(What we want to do is to calculate MSD properly in our supercooled 
water simulations, i.e., to choose correctly the values for -b, -e  
and -trestart in order to get the longest, reliable MSD data)



We found a mail from Gaurav Goel (quoted below) which gives a 
reasonable explanation on the topic.


If we understand this correctly, then the output of

g_msd -b 50 -e 100 -trestart 50

should be the same as the second half (with the proper shifting) of
the output of

g_msd -b 0 -e 100 -trestart 50

But it is not, as anyone can verify with any simulation.
.. what are we missing?

Thanks in advance,
Julian

--
Julian Gelman Constantin
Department of Inorganic, Analytic and Chemical Physics (DQIAQF)
School of Exact and Natural Sciences
University of Buenos Aires, Argentina


---
Gaurav Goel gauravgoeluta at gmail.com
Tue Jul 13 00:56:02 CEST 2010

On Mon, Jul 12, 2010 at 5:22 PM, Ricardo Cuya Guizado
rcuyag at hotmail.com wrote:
 Dear gromacs users
 I make a MD of 20 ns of a solute in water
 With the g_msd program the msd vs the time was obtained
 In the plot, I observed a linear behaviour of the MSD from 0 to 15 
ns and a

 plateau with no linear tendence at the last 5 ns arpoximately.
 In order to know if the observed plateau was due to the data or is 
due to

 the way as the algorithm process the data, I divided the MD in two
 trajectories and obtained the msd for each one.
 From 0-10ns, the plot observed shows a linear tendence en the begining
and a
 plateau with no linear tendence from 9 to 10 ns.
 From 10-20 ns the plot observed was linear from 10 to 18 ns and not 
linear

 at the last, the same plateau was observed.
 Comparing the plots there are not equivalent,.
 Why g_msd produces a non linear plot at the last of the calculation 
and the

 plateau is ever produces.
 Somebody will explain the way as the g_msd algorithm work? and why 
the plot

 are no equivalent or why there must be equivalent?

I will explain how the g_msd algorithm works and hopefully that will
answer all your questions above. What you see in the output file is
average-MSD versus time. This average is done over all the particles
in the group you selected and over multiple time origins (this last
option can be selected with the -trestart parameter). Also, time in
column 1 is time difference from the start of your trajectory to
current time.

E.g., let's say you collected a trajectory over 5 time units and
choose -trestart=1 time unit and -dt=1 time unit.

dt=1 means you'll have 6 configurations for your analysis (including
the configuration at t=0).

trestart=1 means you'll have 5 distinct trajectories for your analysis:
Trajectory 1: 0-5
T2: 1-5
T3: 2-5
T4: 3-5
T5: 4-5

Now you can notice that all 5 trajectories contribute to the average
MSD after 1 time unit (T1-T5), 4 trajectories contribute to the
average MSD after 2 time units (T1-T4),  3 trajectories to the average
MSD after 3 time units (T1-T3), , and only one trajectory to the
MSD after 5 time units (T1). Of course, this assumes that trestart is
large enough that all all these trajectories are uncorrelated.

So, it's clear that longer the time interval at which you want to
evaluate the MSD lesser the number of trajectories used to evaluate
it...and hence, higher error in MSD values at longer times. That might
explain deviation from linear 

[gmx-users] How does g_msd calculates MSD?

2010-10-15 Thread Julian
Hi everybody,

We're trying to understand how gromacs g_msd calculates MSD (without reading 
through the C code, I really don't know much about it, and there are hundreds 
of 
lines).
(What we want to do is to calculate MSD properly in our supercooled water 
simulations, i.e., to choose correctly the values for -b, -e  and -trestart in 
order to get the longest, reliable MSD data)


We found a mail from Gaurav Goel (quoted below) which gives a reasonable 
explanation on the topic.

If we understand this correctly, then the output of

g_msd -b 50 -e 100 -trestart 50

should be the same as the second half (with the proper shifting) of  
the output of

g_msd -b 0 -e 100 -trestart 50

But it is not, as anyone can verify with any simulation.
.. what are we missing?

Thanks in advance,
Julian

--
Julian Gelman Constantin 
Department of Inorganic, Analytic and Chemical Physics (DQIAQF) 
School of Exact and Natural Sciences 
University of Buenos Aires, Argentina


---
Gaurav Goel gauravgoeluta at gmail.com
Tue Jul 13 00:56:02 CEST 2010

On Mon, Jul 12, 2010 at 5:22 PM, Ricardo Cuya Guizado
rcuyag at hotmail.com wrote:
 Dear gromacs users
 I make a MD of 20 ns of a solute in water
 With the g_msd program the msd vs the time was obtained
 In the plot, I observed a linear behaviour of the MSD from 0 to 15 ns and a
 plateau with no linear tendence at the last 5 ns arpoximately.
 In order to know if the observed plateau was due to the data or is due to
 the way as the algorithm process the data, I divided the MD in two
 trajectories and obtained the msd for each one.
 From 0-10ns, the plot observed shows a linear tendence en the begining
and a
 plateau with no linear tendence from 9 to 10 ns.
 From 10-20 ns the plot observed was linear from 10 to 18 ns and not linear
 at the last, the same plateau was observed.
 Comparing the plots there are not equivalent,.
 Why g_msd produces a non linear plot at the last of the calculation and the
 plateau is ever produces.
 Somebody will explain the way as the g_msd algorithm work? and why the plot
 are no equivalent or why there must be equivalent?

I will explain how the g_msd algorithm works and hopefully that will
answer all your questions above. What you see in the output file is
average-MSD versus time. This average is done over all the particles
in the group you selected and over multiple time origins (this last
option can be selected with the -trestart parameter). Also, time in
column 1 is time difference from the start of your trajectory to
current time.

E.g., let's say you collected a trajectory over 5 time units and
choose -trestart=1 time unit and -dt=1 time unit.

dt=1 means you'll have 6 configurations for your analysis (including
the configuration at t=0).

trestart=1 means you'll have 5 distinct trajectories for your analysis:
Trajectory 1: 0-5
T2: 1-5
T3: 2-5
T4: 3-5
T5: 4-5

Now you can notice that all 5 trajectories contribute to the average
MSD after 1 time unit (T1-T5), 4 trajectories contribute to the
average MSD after 2 time units (T1-T4),  3 trajectories to the average
MSD after 3 time units (T1-T3), , and only one trajectory to the
MSD after 5 time units (T1). Of course, this assumes that trestart is
large enough that all all these trajectories are uncorrelated.

So, it's clear that longer the time interval at which you want to
evaluate the MSD lesser the number of trajectories used to evaluate
it...and hence, higher error in MSD values at longer times. That might
explain deviation from linear behaviour at long times.

However, you must be careful in interpreting the MSD data and I
recommend reading some literature on the subject. A plateau in MSD
versus time data might also signify what is called cage motion, in
which a particle or atom is trapped by the surrounding particles and
is not able to move out of that hole on the simulation time scale. If
you want you can send me your MSD versus time data along with some
information on your system (such as potentials, density, temperature
etc.) and I can let you know my comments.

Few words of caution:
Make sure that the center of mass of your particle (or atom or
molecule) is diffusing several particle diameters. Also, make sure
that you're calculating the self-diffusion coefficient  by fitting a
straight line to the linear region of MSD versus time data. You can
either modify the -beginfit and -endfit options... or calculate the
slope of the MSD versus time data using some other software (e.g.,
gnuplot, excel, etc.). If you're doing the latter you'll need to take
a look at the code in gmx_msd.c to know how the diffusion coefficent
is calculated from the slope of MSD versus time data (tog et correct
units, use proper scaling factors, etc.).

I hope that helped.

-Gaurav




 Regards
 Ricardo Cuya



  -- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before