Re: [gmx-users] How does g_msd calculates MSD?
Hello Julian. I've not gone though the C-code neither. However, from what I have read and understood about MSD calculation, both calculations you proposed should not give the same results. This is the way it is calculated. 1. You take different starting times over tyour trajectory, every -trestart picoseconds, from -b (let's take 0ps in this explanation) to -e. 2. From every restarting point (tr) you calculate the MSD, by calculating the diplacement of atomic positions compared to the position at the corresponding tr. 3. With the previous result you'll have a lot of MSD curves, corresponding to each restating point, e.g. one where t0=0ps, other t0=0+trestart, and so on. Actually, you are refering all curves to its corresponding t0, so you have all of them starting at 0. 4. The average of all of these curves is done. With that you obtain properly averaged results (which is essential for a good MSD calculation). However, note that only the restarting point at 0ps of your original trayectory (or the -b time you selected) will contain all the points of the curve since, for example, the last restating point wil conntribute with the few points remaining to the end of the simulation (-e time). For that reason, the average is made with less points as the time from the reference is increased, and at the end of the curve, the results are not well averaged any more. Hope that helps. Javier El 15/10/10 23:58, Julian escribió: Hi everybody, We're trying to understand how gromacs g_msd calculates MSD (without reading through the C code, I really don't know much about it, and there are hundreds of lines). (What we want to do is to calculate MSD properly in our supercooled water simulations, i.e., to choose correctly the values for -b, -e and -trestart in order to get the longest, reliable MSD data) We found a mail from Gaurav Goel (quoted below) which gives a reasonable explanation on the topic. If we understand this correctly, then the output of g_msd -b 50 -e 100 -trestart 50 should be the same as the second half (with the proper shifting) of the output of g_msd -b 0 -e 100 -trestart 50 But it is not, as anyone can verify with any simulation. .. what are we missing? Thanks in advance, Julian -- Julian Gelman Constantin Department of Inorganic, Analytic and Chemical Physics (DQIAQF) School of Exact and Natural Sciences University of Buenos Aires, Argentina --- Gaurav Goel gauravgoeluta at gmail.com Tue Jul 13 00:56:02 CEST 2010 On Mon, Jul 12, 2010 at 5:22 PM, Ricardo Cuya Guizado rcuyag at hotmail.com wrote: Dear gromacs users I make a MD of 20 ns of a solute in water With the g_msd program the msd vs the time was obtained In the plot, I observed a linear behaviour of the MSD from 0 to 15 ns and a plateau with no linear tendence at the last 5 ns arpoximately. In order to know if the observed plateau was due to the data or is due to the way as the algorithm process the data, I divided the MD in two trajectories and obtained the msd for each one. From 0-10ns, the plot observed shows a linear tendence en the begining and a plateau with no linear tendence from 9 to 10 ns. From 10-20 ns the plot observed was linear from 10 to 18 ns and not linear at the last, the same plateau was observed. Comparing the plots there are not equivalent,. Why g_msd produces a non linear plot at the last of the calculation and the plateau is ever produces. Somebody will explain the way as the g_msd algorithm work? and why the plot are no equivalent or why there must be equivalent? I will explain how the g_msd algorithm works and hopefully that will answer all your questions above. What you see in the output file is average-MSD versus time. This average is done over all the particles in the group you selected and over multiple time origins (this last option can be selected with the -trestart parameter). Also, time in column 1 is time difference from the start of your trajectory to current time. E.g., let's say you collected a trajectory over 5 time units and choose -trestart=1 time unit and -dt=1 time unit. dt=1 means you'll have 6 configurations for your analysis (including the configuration at t=0). trestart=1 means you'll have 5 distinct trajectories for your analysis: Trajectory 1: 0-5 T2: 1-5 T3: 2-5 T4: 3-5 T5: 4-5 Now you can notice that all 5 trajectories contribute to the average MSD after 1 time unit (T1-T5), 4 trajectories contribute to the average MSD after 2 time units (T1-T4), 3 trajectories to the average MSD after 3 time units (T1-T3), , and only one trajectory to the MSD after 5 time units (T1). Of course, this assumes that trestart is large enough that all all these trajectories are uncorrelated. So, it's clear that longer the time interval at which you want to evaluate the MSD lesser the number of trajectories used to evaluate it...and hence, higher error in MSD values at longer times. That might explain deviation from linear
[gmx-users] How does g_msd calculates MSD?
Hi everybody, We're trying to understand how gromacs g_msd calculates MSD (without reading through the C code, I really don't know much about it, and there are hundreds of lines). (What we want to do is to calculate MSD properly in our supercooled water simulations, i.e., to choose correctly the values for -b, -e and -trestart in order to get the longest, reliable MSD data) We found a mail from Gaurav Goel (quoted below) which gives a reasonable explanation on the topic. If we understand this correctly, then the output of g_msd -b 50 -e 100 -trestart 50 should be the same as the second half (with the proper shifting) of the output of g_msd -b 0 -e 100 -trestart 50 But it is not, as anyone can verify with any simulation. .. what are we missing? Thanks in advance, Julian -- Julian Gelman Constantin Department of Inorganic, Analytic and Chemical Physics (DQIAQF) School of Exact and Natural Sciences University of Buenos Aires, Argentina --- Gaurav Goel gauravgoeluta at gmail.com Tue Jul 13 00:56:02 CEST 2010 On Mon, Jul 12, 2010 at 5:22 PM, Ricardo Cuya Guizado rcuyag at hotmail.com wrote: Dear gromacs users I make a MD of 20 ns of a solute in water With the g_msd program the msd vs the time was obtained In the plot, I observed a linear behaviour of the MSD from 0 to 15 ns and a plateau with no linear tendence at the last 5 ns arpoximately. In order to know if the observed plateau was due to the data or is due to the way as the algorithm process the data, I divided the MD in two trajectories and obtained the msd for each one. From 0-10ns, the plot observed shows a linear tendence en the begining and a plateau with no linear tendence from 9 to 10 ns. From 10-20 ns the plot observed was linear from 10 to 18 ns and not linear at the last, the same plateau was observed. Comparing the plots there are not equivalent,. Why g_msd produces a non linear plot at the last of the calculation and the plateau is ever produces. Somebody will explain the way as the g_msd algorithm work? and why the plot are no equivalent or why there must be equivalent? I will explain how the g_msd algorithm works and hopefully that will answer all your questions above. What you see in the output file is average-MSD versus time. This average is done over all the particles in the group you selected and over multiple time origins (this last option can be selected with the -trestart parameter). Also, time in column 1 is time difference from the start of your trajectory to current time. E.g., let's say you collected a trajectory over 5 time units and choose -trestart=1 time unit and -dt=1 time unit. dt=1 means you'll have 6 configurations for your analysis (including the configuration at t=0). trestart=1 means you'll have 5 distinct trajectories for your analysis: Trajectory 1: 0-5 T2: 1-5 T3: 2-5 T4: 3-5 T5: 4-5 Now you can notice that all 5 trajectories contribute to the average MSD after 1 time unit (T1-T5), 4 trajectories contribute to the average MSD after 2 time units (T1-T4), 3 trajectories to the average MSD after 3 time units (T1-T3), , and only one trajectory to the MSD after 5 time units (T1). Of course, this assumes that trestart is large enough that all all these trajectories are uncorrelated. So, it's clear that longer the time interval at which you want to evaluate the MSD lesser the number of trajectories used to evaluate it...and hence, higher error in MSD values at longer times. That might explain deviation from linear behaviour at long times. However, you must be careful in interpreting the MSD data and I recommend reading some literature on the subject. A plateau in MSD versus time data might also signify what is called cage motion, in which a particle or atom is trapped by the surrounding particles and is not able to move out of that hole on the simulation time scale. If you want you can send me your MSD versus time data along with some information on your system (such as potentials, density, temperature etc.) and I can let you know my comments. Few words of caution: Make sure that the center of mass of your particle (or atom or molecule) is diffusing several particle diameters. Also, make sure that you're calculating the self-diffusion coefficient by fitting a straight line to the linear region of MSD versus time data. You can either modify the -beginfit and -endfit options... or calculate the slope of the MSD versus time data using some other software (e.g., gnuplot, excel, etc.). If you're doing the latter you'll need to take a look at the code in gmx_msd.c to know how the diffusion coefficent is calculated from the slope of MSD versus time data (tog et correct units, use proper scaling factors, etc.). I hope that helped. -Gaurav Regards Ricardo Cuya -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before