Re: [gmx-users] Why does the -append option exist?
Hi, yes that helps a lot. One more question. What filesystem on hopper 2 are you using for this test (home, scratch or proj, to see if it is Lustre or GPFS) ? And are you running the test on the login node or on the compute node? Thanks Roland On Wed, Jun 8, 2011 at 1:17 PM, Dimitar Pachov dpac...@brandeis.edu wrote: Hello, On Wed, Jun 8, 2011 at 4:21 AM, Sander Pronk pr...@cbr.su.se wrote: Hi Dimitar, Thanks for the bug report. Would you mind trying the test program I attached on the same file system that you get the truncated files on? compile it with gcc testje.c -o testio Yes, but no problem: [dpachov@login-0-0 NEWTEST]$ ./testio TEST PASSED: ftell gives: 46 As for the other questions: HPC OS version: [dpachov@login-0-0 NEWTEST]$ uname -a Linux login-0-0.local 2.6.18-194.17.1.el5xen #1 SMP Mon Sep 20 07:20:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux [dpachov@login-0-0 NEWTEST]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.2 (Tikanga) GROMACS 4.5.4 built: module purge module load INTEL/intel-12.0 module load OPENMPI/1.4.3_INTEL_12.0 module load FFTW/2.1.5-INTEL_12.0 # not needed # # GROMACS settings export CC=mpicc export F77=mpif77 export CXX=mpic++ export FC=mpif90 export F90=mpif90 make distclean echo XXX building single prec XX ./configure --prefix=/home/dpachov/mymodules/GROMACS/EXEC/4.5.4-INTEL_12.0/SINGLE \ --enable-mpi \ --enable-shared \ --program-prefix= --program-suffix= \ --enable-float --disable-fortran \ --with-fft=mkl \ --with-external-blas \ --with-external-lapack \ --with-gsl \ --without-x \ CFLAGS=-O3 -funroll-all-loops \ FFLAGS=-O3 -funroll-all-loops \ CPPFLAGS=-I${MPI_INCLUDE} -I${MKL_INCLUDE} \ LDFLAGS=-L${MPI_LIB} -L${MKL_LIB} -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5 make -j 8 make install Just did the same test on Hopper 2: http://www.nersc.gov/users/computational-systems/hopper/ with their built GROMACS 4.5.3 (gromacs/4.5.3(default)), and the result was the same as reported earlier. You could do the test there as well, if you have access, and see what you would get. Hope that helps a bit. Thanks, Dimitar Sander On Jun 7, 2011, at 23:21 , Dimitar Pachov wrote: Hello, Just a quick update after a few shorts tests we (my colleague and I) quickly did. First, using *You can emulate this yourself by calling sleep 10s before mdrun and see if that's long enough to solve the latency issue in your case.* doesn't work for a few reasons, mainly because it doesn't seem to be a latency issue, but also because the load on a node is not affected by sleep. However, you can reproduce the behavior I have observed pretty easily. It seems to be related to the values of the pointers to the *xtc, *trr, *edr, etc files written at the end of the checkpoint file after abrupt crashes AND to the frequency of access (opening) to those files. How to test: 1. In your input *mdp file put a high frequency of saving coordinates to, say, the *xtc (10, for example) and a low frequency for the *trr file (10,000, for example). 2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run) 3. Kill abruptly the run shortly after that (say, after 10-100 steps). 4. You should have a few frames written in the *xtc file, and the only one (the first) in the *trr file. The *cpt file should have different from zero values for file_offset_low for all of these files (the pointers have been updated). 5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 6. Kill abruptly the run shortly after that (say, after 10-100 steps). Pay attention that the frequency for accessing/writing the *trr has not been reached. 7. You should have a few additional frames written in the *xtc file, while the *trr will still have only 1 frame (the first). The *cpt file now has updated all pointer values file_offset_low, BUT the pointer to the *trr has acquired a value of 0. Obviously, we already now what will happen if we restart again from this last *cpt file. 8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 9. Kill it. 10. File *trr has size zero. Therefore, if a run is killed before the files are accessed for writing (depending on the chosen frequency), the file offset values reported in the *cpt file doesn't seem to be accordingly updated, and hence a new restart inevitably leads to overwritten output files. Do you think this is fixable? Thanks, Dimitar On Sun, Jun 5, 2011 at 6:20 PM, Roland Schulz rol...@utk.edu wrote: Two comments about the discussion: 1) I agree that buffered output (Kernel buffers - not application buffers) should not affect I/O. If it does it should be filed as bug to the OS. Maybe someone can write a short test application which tries to reproduce this idea. Thus writing to a file from one node and immediate after one test program is killed on one node writing to it from some
Re: [gmx-users] Why does the -append option exist?
Hi, On Thu, Jun 9, 2011 at 2:55 AM, Roland Schulz rol...@utk.edu wrote: Hi, yes that helps a lot. One more question. What filesystem on hopper 2 are you using for this test (home, scratch or proj, to see if it is Lustre or GPFS) ? I used home. And are you running the test on the login node or on the compute node? I did the test on the debug queue, so it was a compute node. Let me know if you need more info. Best, Dimitar Thanks Roland On Wed, Jun 8, 2011 at 1:17 PM, Dimitar Pachov dpac...@brandeis.eduwrote: Hello, On Wed, Jun 8, 2011 at 4:21 AM, Sander Pronk pr...@cbr.su.se wrote: Hi Dimitar, Thanks for the bug report. Would you mind trying the test program I attached on the same file system that you get the truncated files on? compile it with gcc testje.c -o testio Yes, but no problem: [dpachov@login-0-0 NEWTEST]$ ./testio TEST PASSED: ftell gives: 46 As for the other questions: HPC OS version: [dpachov@login-0-0 NEWTEST]$ uname -a Linux login-0-0.local 2.6.18-194.17.1.el5xen #1 SMP Mon Sep 20 07:20:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux [dpachov@login-0-0 NEWTEST]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.2 (Tikanga) GROMACS 4.5.4 built: module purge module load INTEL/intel-12.0 module load OPENMPI/1.4.3_INTEL_12.0 module load FFTW/2.1.5-INTEL_12.0 # not needed # # GROMACS settings export CC=mpicc export F77=mpif77 export CXX=mpic++ export FC=mpif90 export F90=mpif90 make distclean echo XXX building single prec XX ./configure --prefix=/home/dpachov/mymodules/GROMACS/EXEC/4.5.4-INTEL_12.0/SINGLE \ --enable-mpi \ --enable-shared \ --program-prefix= --program-suffix= \ --enable-float --disable-fortran \ --with-fft=mkl \ --with-external-blas \ --with-external-lapack \ --with-gsl \ --without-x \ CFLAGS=-O3 -funroll-all-loops \ FFLAGS=-O3 -funroll-all-loops \ CPPFLAGS=-I${MPI_INCLUDE} -I${MKL_INCLUDE} \ LDFLAGS=-L${MPI_LIB} -L${MKL_LIB} -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5 make -j 8 make install Just did the same test on Hopper 2: http://www.nersc.gov/users/computational-systems/hopper/ with their built GROMACS 4.5.3 (gromacs/4.5.3(default)), and the result was the same as reported earlier. You could do the test there as well, if you have access, and see what you would get. Hope that helps a bit. Thanks, Dimitar Sander On Jun 7, 2011, at 23:21 , Dimitar Pachov wrote: Hello, Just a quick update after a few shorts tests we (my colleague and I) quickly did. First, using *You can emulate this yourself by calling sleep 10s before mdrun and see if that's long enough to solve the latency issue in your case.* doesn't work for a few reasons, mainly because it doesn't seem to be a latency issue, but also because the load on a node is not affected by sleep. However, you can reproduce the behavior I have observed pretty easily. It seems to be related to the values of the pointers to the *xtc, *trr, *edr, etc files written at the end of the checkpoint file after abrupt crashes AND to the frequency of access (opening) to those files. How to test: 1. In your input *mdp file put a high frequency of saving coordinates to, say, the *xtc (10, for example) and a low frequency for the *trr file (10,000, for example). 2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run) 3. Kill abruptly the run shortly after that (say, after 10-100 steps). 4. You should have a few frames written in the *xtc file, and the only one (the first) in the *trr file. The *cpt file should have different from zero values for file_offset_low for all of these files (the pointers have been updated). 5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 6. Kill abruptly the run shortly after that (say, after 10-100 steps). Pay attention that the frequency for accessing/writing the *trr has not been reached. 7. You should have a few additional frames written in the *xtc file, while the *trr will still have only 1 frame (the first). The *cpt file now has updated all pointer values file_offset_low, BUT the pointer to the *trr has acquired a value of 0. Obviously, we already now what will happen if we restart again from this last *cpt file. 8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 9. Kill it. 10. File *trr has size zero. Therefore, if a run is killed before the files are accessed for writing (depending on the chosen frequency), the file offset values reported in the *cpt file doesn't seem to be accordingly updated, and hence a new restart inevitably leads to overwritten output files. Do you think this is fixable? Thanks, Dimitar On Sun, Jun 5, 2011 at 6:20 PM, Roland Schulz rol...@utk.edu wrote: Two comments about the discussion: 1) I agree that buffered output (Kernel buffers - not application buffers) should not affect I/O. If it does it should be filed as bug to the
Re: [gmx-users] Why does the -append option exist?
Hi Dimitar,Thanks for the bug report. Would you mind trying the test program I attached on the same file system that you get the truncated files on?compile it with gcc testje.c -o testioSander testje.c Description: Binary data On Jun 7, 2011, at 23:21 , Dimitar Pachov wrote:Hello,Just a quick update after a few shorts tests we (my colleague and I) quickly did. First, using"You can emulate this yourself by calling "sleep 10s" before mdrun and see if that's long enough to solve the latency issue in your case." doesn't work for a few reasons, mainly because it doesn't seem to be a latency issue, but also because the load on a node is not affected by "sleep".However, you can reproduce the behavior I have observed pretty easily. It seems to be related to the values of the pointers to the *xtc, *trr, *edr, etc files written at the end of the checkpoint file after abrupt crashes AND to the frequency of access (opening) to those files. How to test: 1. In your input *mdp file put a high frequency of saving coordinates to, say, the *xtc (10, for example) and a low frequency for the *trr file (10,000, for example).2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run) 3. Kill abruptly the run shortly after that (say, after 10-100 steps).4. You should have a few frames written in the *xtc file, and the only one (the first) in the *trr file. The *cpt file should have different from zero values for "file_offset_low" for all of these files (the pointers have been updated). 5.Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run).6. Kill abruptlythe run shortly after that (say, after 10-100 steps). Pay attention that the frequency for accessing/writing the *trr has not been reached. 7.You should have a few additional frames written in the *xtc file, while the *trr will still have only 1 frame (the first). The *cpt file now has updated all pointer values "file_offset_low", BUT the pointer to the *trr has acquired a value of 0. Obviously, we already now what will happen if we restart again from this last *cpt file. 8.Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run).9. Kill it.10. File *trr has size zero.Therefore, if a run is killed before the files are accessed for writing (depending on the chosen frequency), the file offset values reported in the *cpt file doesn't seem to be accordingly updated, and hence a new restart inevitably leads to overwritten output files. Do you think this is fixable?Thanks,DimitarOn Sun, Jun 5, 2011 at 6:20 PM, Roland Schulz rol...@utk.edu wrote: Two comments about the discussion:1) I agree that buffered output (Kernel buffers - not application buffers) should not affect I/O. If it does it should be filed as bug to the OS. Maybe someone can write a short test application which tries to reproduce this idea. Thus writing to a file from one node and immediate after one test program is killed on one node writing to it from some other node. 2) We lock files but only the log file. The idea is that we only need toguaranteethat the set of files is only accessed by one application. This seems safe but in case someone sees a way of how the trajectory is opened without the log file being opened, please file a bug. RolandOn Sun, Jun 5, 2011 at 10:13 AM, Mark Abraham mark.abra...@anu.edu.au wrote: On 5/06/2011 11:08 PM, Francesco Oteri wrote: Dear Dimitar, I'm following the debate regarding: The point was not "why" I was getting the restarts, but the fact itself that I was getting restarts close in time, as I stated in my first post. I actually also don't know whether jobs are deleted or suspended. I've thought that a job returned back to the queue will basically start from the beginning when later moved to an empty slot ... so don't understand the difference from that perspective. In the second mail yoo say: Submitted by: ii=1 ifmpi="mpirun -np $NSLOTS" if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile="md-${ii}-$k.out" if [[ -f run${ii}.cpt ]]; then $ifmpi `which mdrun` -s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = If I understand well, you are submitting the SERIAL mdrun. This means that multiple instances of mdrun are running at the same time. Each instance of mdrun is an INDIPENDENT instance.
Re: [gmx-users] Why does the -append option exist?
Hello, On Wed, Jun 8, 2011 at 4:21 AM, Sander Pronk pr...@cbr.su.se wrote: Hi Dimitar, Thanks for the bug report. Would you mind trying the test program I attached on the same file system that you get the truncated files on? compile it with gcc testje.c -o testio Yes, but no problem: [dpachov@login-0-0 NEWTEST]$ ./testio TEST PASSED: ftell gives: 46 As for the other questions: HPC OS version: [dpachov@login-0-0 NEWTEST]$ uname -a Linux login-0-0.local 2.6.18-194.17.1.el5xen #1 SMP Mon Sep 20 07:20:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux [dpachov@login-0-0 NEWTEST]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.2 (Tikanga) GROMACS 4.5.4 built: module purge module load INTEL/intel-12.0 module load OPENMPI/1.4.3_INTEL_12.0 module load FFTW/2.1.5-INTEL_12.0 # not needed # # GROMACS settings export CC=mpicc export F77=mpif77 export CXX=mpic++ export FC=mpif90 export F90=mpif90 make distclean echo XXX building single prec XX ./configure --prefix=/home/dpachov/mymodules/GROMACS/EXEC/4.5.4-INTEL_12.0/SINGLE \ --enable-mpi \ --enable-shared \ --program-prefix= --program-suffix= \ --enable-float --disable-fortran \ --with-fft=mkl \ --with-external-blas \ --with-external-lapack \ --with-gsl \ --without-x \ CFLAGS=-O3 -funroll-all-loops \ FFLAGS=-O3 -funroll-all-loops \ CPPFLAGS=-I${MPI_INCLUDE} -I${MKL_INCLUDE} \ LDFLAGS=-L${MPI_LIB} -L${MKL_LIB} -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5 make -j 8 make install Just did the same test on Hopper 2: http://www.nersc.gov/users/computational-systems/hopper/ with their built GROMACS 4.5.3 (gromacs/4.5.3(default)), and the result was the same as reported earlier. You could do the test there as well, if you have access, and see what you would get. Hope that helps a bit. Thanks, Dimitar Sander On Jun 7, 2011, at 23:21 , Dimitar Pachov wrote: Hello, Just a quick update after a few shorts tests we (my colleague and I) quickly did. First, using *You can emulate this yourself by calling sleep 10s before mdrun and see if that's long enough to solve the latency issue in your case.* doesn't work for a few reasons, mainly because it doesn't seem to be a latency issue, but also because the load on a node is not affected by sleep. However, you can reproduce the behavior I have observed pretty easily. It seems to be related to the values of the pointers to the *xtc, *trr, *edr, etc files written at the end of the checkpoint file after abrupt crashes AND to the frequency of access (opening) to those files. How to test: 1. In your input *mdp file put a high frequency of saving coordinates to, say, the *xtc (10, for example) and a low frequency for the *trr file (10,000, for example). 2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run) 3. Kill abruptly the run shortly after that (say, after 10-100 steps). 4. You should have a few frames written in the *xtc file, and the only one (the first) in the *trr file. The *cpt file should have different from zero values for file_offset_low for all of these files (the pointers have been updated). 5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 6. Kill abruptly the run shortly after that (say, after 10-100 steps). Pay attention that the frequency for accessing/writing the *trr has not been reached. 7. You should have a few additional frames written in the *xtc file, while the *trr will still have only 1 frame (the first). The *cpt file now has updated all pointer values file_offset_low, BUT the pointer to the *trr has acquired a value of 0. Obviously, we already now what will happen if we restart again from this last *cpt file. 8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 9. Kill it. 10. File *trr has size zero. Therefore, if a run is killed before the files are accessed for writing (depending on the chosen frequency), the file offset values reported in the *cpt file doesn't seem to be accordingly updated, and hence a new restart inevitably leads to overwritten output files. Do you think this is fixable? Thanks, Dimitar On Sun, Jun 5, 2011 at 6:20 PM, Roland Schulz rol...@utk.edu wrote: Two comments about the discussion: 1) I agree that buffered output (Kernel buffers - not application buffers) should not affect I/O. If it does it should be filed as bug to the OS. Maybe someone can write a short test application which tries to reproduce this idea. Thus writing to a file from one node and immediate after one test program is killed on one node writing to it from some other node. 2) We lock files but only the log file. The idea is that we only need to guarantee that the set of files is only accessed by one application. This seems safe but in case someone sees a way of how the trajectory is opened without the log file being opened, please file a bug. Roland On Sun, Jun 5, 2011 at 10:13 AM, Mark Abraham mark.abra...@anu.edu.auwrote: On
Re: [gmx-users] Why does the -append option exist?
Hello, Just a quick update after a few shorts tests we (my colleague and I) quickly did. First, using *You can emulate this yourself by calling sleep 10s before mdrun and see if that's long enough to solve the latency issue in your case.* doesn't work for a few reasons, mainly because it doesn't seem to be a latency issue, but also because the load on a node is not affected by sleep. However, you can reproduce the behavior I have observed pretty easily. It seems to be related to the values of the pointers to the *xtc, *trr, *edr, etc files written at the end of the checkpoint file after abrupt crashes AND to the frequency of access (opening) to those files. How to test: 1. In your input *mdp file put a high frequency of saving coordinates to, say, the *xtc (10, for example) and a low frequency for the *trr file (10,000, for example). 2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run) 3. Kill abruptly the run shortly after that (say, after 10-100 steps). 4. You should have a few frames written in the *xtc file, and the only one (the first) in the *trr file. The *cpt file should have different from zero values for file_offset_low for all of these files (the pointers have been updated). 5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 6. Kill abruptly the run shortly after that (say, after 10-100 steps). Pay attention that the frequency for accessing/writing the *trr has not been reached. 7. You should have a few additional frames written in the *xtc file, while the *trr will still have only 1 frame (the first). The *cpt file now has updated all pointer values file_offset_low, BUT the pointer to the *trr has acquired a value of 0. Obviously, we already now what will happen if we restart again from this last *cpt file. 8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 9. Kill it. 10. File *trr has size zero. Therefore, if a run is killed before the files are accessed for writing (depending on the chosen frequency), the file offset values reported in the *cpt file doesn't seem to be accordingly updated, and hence a new restart inevitably leads to overwritten output files. Do you think this is fixable? Thanks, Dimitar On Sun, Jun 5, 2011 at 6:20 PM, Roland Schulz rol...@utk.edu wrote: Two comments about the discussion: 1) I agree that buffered output (Kernel buffers - not application buffers) should not affect I/O. If it does it should be filed as bug to the OS. Maybe someone can write a short test application which tries to reproduce this idea. Thus writing to a file from one node and immediate after one test program is killed on one node writing to it from some other node. 2) We lock files but only the log file. The idea is that we only need to guarantee that the set of files is only accessed by one application. This seems safe but in case someone sees a way of how the trajectory is opened without the log file being opened, please file a bug. Roland On Sun, Jun 5, 2011 at 10:13 AM, Mark Abraham mark.abra...@anu.edu.auwrote: On 5/06/2011 11:08 PM, Francesco Oteri wrote: Dear Dimitar, I'm following the debate regarding: The point was not why I was getting the restarts, but the fact itself that I was getting restarts close in time, as I stated in my first post. I actually also don't know whether jobs are deleted or suspended. I've thought that a job returned back to the queue will basically start from the beginning when later moved to an empty slot ... so don't understand the difference from that perspective. In the second mail yoo say: Submitted by: ii=1 ifmpi=mpirun -np $NSLOTS if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile=md-${ii}-$k.out if [[ -f run${ii}.cpt ]]; then * $ifmpi `which mdrun` *-s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = If I understand well, you are submitting the SERIAL mdrun. This means that multiple instances of mdrun are running at the same time. Each instance of mdrun is an INDIPENDENT instance. Therefore checkpoint files, one for each instance (i.e. one for each CPU), are written at the same time. Good thought, but Dimitar's stdout excerpts from early in the thread do indicate the presence of multiple execution threads. Dynamic load balancing gets turned on, and the DD is 4x2x1 for his 8 processors. Conventionally, and by default in the installation process, the MPI-enabled binaries get an _mpi suffix, but it isn't enforced - or enforceable :-) Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or
Re: [gmx-users] Why does the -append option exist?
Dimitar Pachov wrote: Hello, Just a quick update after a few shorts tests we (my colleague and I) quickly did. First, using /You can emulate this yourself by calling sleep 10s before mdrun and see if that's long enough to solve the latency issue in your case./ doesn't work for a few reasons, mainly because it doesn't seem to be a latency issue, but also because the load on a node is not affected by sleep. However, you can reproduce the behavior I have observed pretty easily. It seems to be related to the values of the pointers to the *xtc, *trr, *edr, etc files written at the end of the checkpoint file after abrupt crashes AND to the frequency of access (opening) to those files. How to test: 1. In your input *mdp file put a high frequency of saving coordinates to, say, the *xtc (10, for example) and a low frequency for the *trr file (10,000, for example). 2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run) 3. Kill abruptly the run shortly after that (say, after 10-100 steps). 4. You should have a few frames written in the *xtc file, and the only one (the first) in the *trr file. The *cpt file should have different from zero values for file_offset_low for all of these files (the pointers have been updated). 5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 6. Kill abruptly the run shortly after that (say, after 10-100 steps). Pay attention that the frequency for accessing/writing the *trr has not been reached. 7. You should have a few additional frames written in the *xtc file, while the *trr will still have only 1 frame (the first). The *cpt file now has updated all pointer values file_offset_low, BUT the pointer to the *trr has acquired a value of 0. Obviously, we already now what will happen if we restart again from this last *cpt file. 8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 9. Kill it. 10. File *trr has size zero. Therefore, if a run is killed before the files are accessed for writing (depending on the chosen frequency), the file offset values reported in the *cpt file doesn't seem to be accordingly updated, and hence a new restart inevitably leads to overwritten output files. Do you think this is fixable? Perhaps, but it will require some more details. I cannot reproduce this problem, and I wonder if it is compiler- or platform-specific. Can you please provide: 1. Compiler (and version) used to build Gromacs 2. Hardware details 3. Command used to configure Gromacs -Justin -- Justin A. Lemkul Ph.D. Candidate ICTAS Doctoral Scholar MILES-IGERT Trainee Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Why does the -append option exist?
On 5/06/2011 12:31 PM, Dimitar Pachov wrote: This script is not using mdrun -append. -append is the default, it doesn't need to be explicitly listed. Ah yes, very true. Your original post suggested the use of -append was a problem. Why aren't we seeing a script with mdrun -append? Also, please provide the full script - it looks like there might be a loop around your tpbconv-then-mdrun fragment. There is no loop; this is a job script with PBS directives. The header of it looks like: === #!/bin/bash #$ -S /bin/bash #$ -pe mpich 8 #$ -ckpt reloc #$ -l mem_total=6G === as usual submitted by: qsub -N myjob.q Note that a useful trouble-shooting technique can be to construct your command line in a shell variable, echo it to stdout (redirected as suitable) and then execute the contents of the variable. Now, nobody has to parse a shell script to know what command line generated what output, and it can be co-located with the command's stdout. I somewhat understand your point, but could give an example if you think it is really necessary? It's just generally helpful if your stdout has mpirun -np 8 /path/to/mdrun_mpi -deffnm run_4 -cpi run_4 at the top of it so that you have a definitive record of what you did under the environment that existed at the time of execution. As I said, the queue is like this: you submit the job, it finds an empty node, it goes there, however seconds later another user with higher privileges on that particular node submits a job, his job kicks out my job, mine goes on the queue again, it finds another empty node, goes there, then another user with high privileges on that node submits a job, which consequently kicks out my job again, and the cycle repeats itself ... theoretically, it could continue forever, depending on how many and where the empty nodes are, if any. You've said that *now* - but previously you've said nothing about why you were getting lots of restarts. In my experience, PBS queues suspend jobs rather than deleting them, in order that resources are not wasted. Apparently other places do things this way. I think that this information is highly relevant to explaining your observations. These many restarts suggest that the queue was full with relatively short jobs ran by users with high privileges. Technically, I cannot see why the same processes should be running simultaneously because at any instant my job runs only on one node, or it stays in the queuing list. I/O can be buffered such that the termination of the process and the completion of its I/O are asynchronous. Perhaps it *shouldn't* be that way, but this is a problem for the administrators of your cluster to address. They know how the file system works. If the next job executes before the old one has finished output, then I think the symptoms you observe might be possible. Note that there is nothing GROMACS can do about that, unless somehow GROMACS can apply a lock in the first mdrun that is respected by your file system such that a subsequent mdrun cannot open the same file until all pending I/O has completed. I'd expect proper HPC file systems do that automatically, but I don't really know. From md-1-2360.out: = ::: Getting Loaded... Reading file run1.tpr, VERSION 4.5.4 (single precision) Reading checkpoint file run1.cpt generated: Tue May 31 10:45:22 2011 Loaded with Money Making 2D domain decomposition 4 x 2 x 1 WARNING: This run will generate roughly 4915 Mb of data starting mdrun 'run1' 1 steps, 20.0 ps (continuing from step 51879590, 103759.2 ps). = These aren't showing anything other than that the restart is coming from the same point each time. And from the last generated output md-1-2437.out (I think I killed the job at that point because of the above observed behavior): = ::: Getting Loaded... Reading file run1.tpr, VERSION 4.5.4 (single precision) = I have at least 5-6 additional examples like this one. In some of them the *xtc file does have size greater than zero yet still very small, but it starts from some random frame (for example, in one of the cases it contains frames from ~91000ps to ~104000ps, but all frames before 91000ps are missing). I think that demonstrating a problem requires that the set of output files were fine before one particular restart, and weird afterwards. I don't think we've seen that yet. I don't understand your point here. I am providing you with all info I have. I am showing the output files of 3 restarts, and they are different in a sense that the last two did not progress further enough before another job restart
Re: [gmx-users] Why does the -append option exist?
On Sun, Jun 5, 2011 at 2:14 AM, Mark Abraham mark.abra...@anu.edu.auwrote: On 5/06/2011 12:31 PM, Dimitar Pachov wrote: As I said, the queue is like this: you submit the job, it finds an empty node, it goes there, however seconds later another user with higher privileges on that particular node submits a job, his job kicks out my job, mine goes on the queue again, it finds another empty node, goes there, then another user with high privileges on that node submits a job, which consequently kicks out my job again, and the cycle repeats itself ... theoretically, it could continue forever, depending on how many and where the empty nodes are, if any. You've said that *now* - but previously you've said nothing about why you were getting lots of restarts. In my experience, PBS queues suspend jobs rather than deleting them, in order that resources are not wasted. Apparently other places do things this way. I think that this information is highly relevant to explaining your observations. The point was not why I was getting the restarts, but the fact itself that I was getting restarts close in time, as I stated in my first post. I actually also don't know whether jobs are deleted or suspended. I've thought that a job returned back to the queue will basically start from the beginning when later moved to an empty slot ... so don't understand the difference from that perspective. These many restarts suggest that the queue was full with relatively short jobs ran by users with high privileges. Technically, I cannot see why the same processes should be running simultaneously because at any instant my job runs only on one node, or it stays in the queuing list. I/O can be buffered such that the termination of the process and the completion of its I/O are asynchronous. Perhaps it *shouldn't* be that way, but this is a problem for the administrators of your cluster to address. They know how the file system works. If the next job executes before the old one has finished output, then I think the symptoms you observe might be possible. Yes, this is true, and I believe the timing of when the buffer is fully flushed is crucial in providing a possible explanation in the observed behavior. However, this bottleneck has been known for a long time, so I expected people had thought about that before confidently putting -append as a default. That's all. Note that there is nothing GROMACS can do about that, unless somehow GROMACS can apply a lock in the first mdrun that is respected by your file system such that a subsequent mdrun cannot open the same file until all pending I/O has completed. I'd expect proper HPC file systems do that automatically, but I don't really know. I am not an expert nor do I know the Gromacs coding, but could one have an option to specify certain timing before which Gromacs is prohibited to output/write any files after its initial start, i.e. some kind of suspension and/or waiting period? I am also wondering about the checkpoint timing - the default is 15 min, but what would be the minimum? Since I have not tested it, what would happen if I specify 0.001 min, for example? From md-1-2360.out: = ::: Getting Loaded... Reading file run1.tpr, VERSION 4.5.4 (single precision) Reading checkpoint file run1.cpt generated: Tue May 31 10:45:22 2011 Loaded with Money Making 2D domain decomposition 4 x 2 x 1 WARNING: This run will generate roughly 4915 Mb of data starting mdrun 'run1' 1 steps, 20.0 ps (continuing from step 51879590, 103759.2 ps). = These aren't showing anything other than that the restart is coming from the same point each time. And from the last generated output md-1-2437.out (I think I killed the job at that point because of the above observed behavior): = ::: Getting Loaded... Reading file run1.tpr, VERSION 4.5.4 (single precision) = I have at least 5-6 additional examples like this one. In some of them the *xtc file does have size greater than zero yet still very small, but it starts from some random frame (for example, in one of the cases it contains frames from ~91000ps to ~104000ps, but all frames before 91000ps are missing). I think that demonstrating a problem requires that the set of output files were fine before one particular restart, and weird afterwards. I don't think we've seen that yet. I don't understand your point here. I am providing you with all info I have. I am showing the output files of 3 restarts, and they are different in a sense that the last two did not progress further enough before another job restart occurred. The first was fine before the restart, and the others were not exactly fine after the restart. At this point I realize that what I call restart and what you call restart might be two
Re: [gmx-users] Why does the -append option exist?
On 5/06/2011 5:42 PM, Dimitar Pachov wrote: On Sun, Jun 5, 2011 at 2:14 AM, Mark Abraham mark.abra...@anu.edu.au mailto:mark.abra...@anu.edu.au wrote: On 5/06/2011 12:31 PM, Dimitar Pachov wrote: As I said, the queue is like this: you submit the job, it finds an empty node, it goes there, however seconds later another user with higher privileges on that particular node submits a job, his job kicks out my job, mine goes on the queue again, it finds another empty node, goes there, then another user with high privileges on that node submits a job, which consequently kicks out my job again, and the cycle repeats itself ... theoretically, it could continue forever, depending on how many and where the empty nodes are, if any. You've said that *now* - but previously you've said nothing about why you were getting lots of restarts. In my experience, PBS queues suspend jobs rather than deleting them, in order that resources are not wasted. Apparently other places do things this way. I think that this information is highly relevant to explaining your observations. The point was not why I was getting the restarts, but the fact itself that I was getting restarts close in time, as I stated in my first post. I actually also don't know whether jobs are deleted or suspended. I've thought that a job returned back to the queue will basically start from the beginning when later moved to an empty slot ... so don't understand the difference from that perspective. It's the difference between a process being killed, and a process being allowed to survive but temporarily without access to the CPU. Operating systems routinely share the CPU over multiple execution threads. Job suspension just adapts that idea. Also, different UNIX signals are interpreted differently by the GROMACS signal handler. It respects hard kills, but it cooperates with gentler kills by updating the checkpoint file at the next neighbour-search step, IIRC. Perhaps your PBS is making excessive use of hard kills - if it didn't, you still get to make some progress when you only get a minute of CPU time... These many restarts suggest that the queue was full with relatively short jobs ran by users with high privileges. Technically, I cannot see why the same processes should be running simultaneously because at any instant my job runs only on one node, or it stays in the queuing list. I/O can be buffered such that the termination of the process and the completion of its I/O are asynchronous. Perhaps it *shouldn't* be that way, but this is a problem for the administrators of your cluster to address. They know how the file system works. If the next job executes before the old one has finished output, then I think the symptoms you observe might be possible. Yes, this is true, and I believe the timing of when the buffer is fully flushed is crucial in providing a possible explanation in the observed behavior. However, this bottleneck has been known for a long time, so I expected people had thought about that before confidently putting -append as a default. That's all. Judging by the frequency of people reporting problems, most people don't encounter the kind of file system latency leading to race condition problem I think that you're seeing. Some might see it, and just work around, as you say. Or other people just don't have the combination of file system and compute resource management that you have to work with. Note that there is nothing GROMACS can do about that, unless somehow GROMACS can apply a lock in the first mdrun that is respected by your file system such that a subsequent mdrun cannot open the same file until all pending I/O has completed. I'd expect proper HPC file systems do that automatically, but I don't really know. I am not an expert nor do I know the Gromacs coding, but could one have an option to specify certain timing before which Gromacs is prohibited to output/write any files after its initial start, i.e. some kind of suspension and/or waiting period? One could delay some/all output initialization until the first write, but it probably makes the code rather more messy. GROMACS does check that the state of the output files make sense, by computing and comparing checksums stored in the checkpoint file. One has to draw a line somewhere. If the contents of those files might be changed by another process, then efficient MD is simply impossible. Also, there would be people complain that they spent 15 minutes on their 1024-processor simulation before it died when the lack of write permission for the checkpoint filename got noticed. Perhaps not that exact scenario, but similar could arise. You can emulate this yourself by calling sleep 10s before mdrun and see if that's long enough to solve the latency issue in your case. It seems to me that this kind of
Re: [gmx-users] Why does the -append option exist?
Dear Dimitar, I'm following the debate regarding: The point was not why I was getting the restarts, but the fact itself that I was getting restarts close in time, as I stated in my first post. I actually also don't know whether jobs are deleted or suspended. I've thought that a job returned back to the queue will basically start from the beginning when later moved to an empty slot ... so don't understand the difference from that perspective. In the second mail yoo say: Submitted by: ii=1 ifmpi=mpirun -np $NSLOTS if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile=md-${ii}-$k.out if [[ -f run${ii}.cpt ]]; then * $ifmpi `which mdrun` *-s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = If I understand well, you are submitting the SERIAL mdrun. This means that multiple instances of mdrun are running at the same time. Each instance of mdrun is an INDIPENDENT instance. Therefore checkpoint files, one for each instance (i.e. one for each CPU), are written at the same time. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Why does the -append option exist?
On 5/06/2011 11:08 PM, Francesco Oteri wrote: Dear Dimitar, I'm following the debate regarding: The point was not why I was getting the restarts, but the fact itself that I was getting restarts close in time, as I stated in my first post. I actually also don't know whether jobs are deleted or suspended. I've thought that a job returned back to the queue will basically start from the beginning when later moved to an empty slot ... so don't understand the difference from that perspective. In the second mail yoo say: Submitted by: ii=1 ifmpi=mpirun -np $NSLOTS if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile=md-${ii}-$k.out if [[ -f run${ii}.cpt ]]; then * $ifmpi `which mdrun` *-s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = If I understand well, you are submitting the SERIAL mdrun. This means that multiple instances of mdrun are running at the same time. Each instance of mdrun is an INDIPENDENT instance. Therefore checkpoint files, one for each instance (i.e. one for each CPU), are written at the same time. Good thought, but Dimitar's stdout excerpts from early in the thread do indicate the presence of multiple execution threads. Dynamic load balancing gets turned on, and the DD is 4x2x1 for his 8 processors. Conventionally, and by default in the installation process, the MPI-enabled binaries get an _mpi suffix, but it isn't enforced - or enforceable :-) Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Why does the -append option exist?
Two comments about the discussion: 1) I agree that buffered output (Kernel buffers - not application buffers) should not affect I/O. If it does it should be filed as bug to the OS. Maybe someone can write a short test application which tries to reproduce this idea. Thus writing to a file from one node and immediate after one test program is killed on one node writing to it from some other node. 2) We lock files but only the log file. The idea is that we only need to guarantee that the set of files is only accessed by one application. This seems safe but in case someone sees a way of how the trajectory is opened without the log file being opened, please file a bug. Roland On Sun, Jun 5, 2011 at 10:13 AM, Mark Abraham mark.abra...@anu.edu.auwrote: On 5/06/2011 11:08 PM, Francesco Oteri wrote: Dear Dimitar, I'm following the debate regarding: The point was not why I was getting the restarts, but the fact itself that I was getting restarts close in time, as I stated in my first post. I actually also don't know whether jobs are deleted or suspended. I've thought that a job returned back to the queue will basically start from the beginning when later moved to an empty slot ... so don't understand the difference from that perspective. In the second mail yoo say: Submitted by: ii=1 ifmpi=mpirun -np $NSLOTS if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile=md-${ii}-$k.out if [[ -f run${ii}.cpt ]]; then * $ifmpi `which mdrun` *-s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = If I understand well, you are submitting the SERIAL mdrun. This means that multiple instances of mdrun are running at the same time. Each instance of mdrun is an INDIPENDENT instance. Therefore checkpoint files, one for each instance (i.e. one for each CPU), are written at the same time. Good thought, but Dimitar's stdout excerpts from early in the thread do indicate the presence of multiple execution threads. Dynamic load balancing gets turned on, and the DD is 4x2x1 for his 8 processors. Conventionally, and by default in the installation process, the MPI-enabled binaries get an _mpi suffix, but it isn't enforced - or enforceable :-) Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 865-241-1537, ORNL PO BOX 2008 MS6309 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Why does the -append option exist?
By the way, is this ever reviewed: Your mail to 'gmx-users' with the subject Re: [gmx-users] Why does the -append option exist? Is being held until the list moderator can review it for approval. On Fri, Jun 3, 2011 at 9:24 PM, Mark Abraham mark.abra...@anu.edu.au wrote: On 4/06/2011 8:26 AM, Dimitar Pachov wrote If this is true, then it wants fixing, and fast, and will get it :-) However, it would be surprising for such a problem to exist and not have been reported up to now. This feature has been in the code for a year now, and while some minor issues have been fixed since the 4.5 release, it would surprise me greatly if your claim was true. You're saying the equivalent of the steps below can occur: 1. Simulation wanders along normally and writes a checkpoint at step 1003 2. Random crash happens at step 1106 3. An -append restart from the old .tpr and the recent .cpt file will restart from step 1003 4. Random crash happens at step 1059 5. Now a restart doesn't restart from step 1003, but some other step and most importantly, the most important piece of data, that being the trajectory file, could be completely lost! I don't know the code behind the checkpointing appending, but I can see how easy one can overwrite 100ns trajectories, for example, and obtain the same trajectories of size 0. I don't see how easy that is, without a concrete example, where user error is not possible. Here is an example: [dpachov]$ ll -rth run1* \#run1* -rw-rw-r-- 1 dpachov dpachov 11K May 2 02:59 run1.po.mdp -rw-rw-r-- 1 dpachov dpachov 4.6K May 2 02:59 run1.grompp.out -rw-rw-r-- 1 dpachov dpachov 3.5M May 13 19:09 run1.gro -rw-rw-r-- 1 dpachov dpachov 2.3M May 14 00:40 run1.tpr -rw-rw-r-- 1 dpachov dpachov 2.3M May 14 00:40 run1-i.tpr -rw-rw-r-- 1 dpachov dpachov0 May 29 21:53 run1.trr -rw-rw-r-- 1 dpachov dpachov 1.2M May 31 10:45 run1.cpt -rw-rw-r-- 1 dpachov dpachov 1.2M May 31 10:45 run1_prev.cpt -rw-rw-r-- 1 dpachov dpachov0 Jun 3 14:03 run1.xtc -rw-rw-r-- 1 dpachov dpachov0 Jun 3 14:03 run1.edr -rw-rw-r-- 1 dpachov dpachov 15M Jun 3 17:03 run1.log Submitted by: ii=1 ifmpi=mpirun -np $NSLOTS if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile=md-${ii}-$k.out if [[ -f run${ii}.cpt ]]; then $ifmpi `which mdrun` -s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = From the end of run1.log: = Started mdrun on node 0 Tue May 31 10:28:52 2011 Step Time Lambda 51879390 103758.780000.0 Energies (kJ/mol) U-BProper Dih. Improper Dih. CMAP Dih. LJ-14 8.37521e+034.52303e+034.78633e+02 -1.23174e+032.87366e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 3.02277e+049.48267e+04 -3.88596e+03 -7.43902e+05 -8.36436e+04 PotentialKinetic En. Total EnergyTemperature Pres. DC (bar) -6.91359e+051.29016e+05 -5.62342e+053.00159e+02 -1.24746e+02 Pressure (bar) Constr. rmsd -2.43143e+000.0e+00 DD step 51879399 load imb.: force 225.5% Writing checkpoint, step 51879590 at Tue May 31 10:45:22 2011 --- Restarting from checkpoint, appending to previous log file. Log file opened on Fri Jun 3 17:03:20 2011 Host: compute-1-13.local pid: 337 nodeid: 0 nnodes: 8 The Gromacs distribution was built Tue Mar 22 09:26:37 EDT 2011 by dpachov@login-0-0.local (Linux 2.6.18-194.17.1.el5xen x86_64) ::: ::: ::: Grid: 13 x 15 x 11 cells Initial temperature: 301.137 K Started mdrun on node 0 Fri Jun 3 13:58:07 2011 Step Time Lambda 51879590 103759.180000.0 Energies (kJ/mol) U-BProper Dih. Improper Dih. CMAP Dih. LJ-14 8.47435e+034.61654e+033.99388e+02 -1.16765e+032.93920e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 2.99294e+049.42035e+04 -3.87927e+03 -7.43250e+05 -8.35872e+04 PotentialKinetic En. Total EnergyTemperature Pres. DC (bar) -6.91322e+051.29433e+05 -5.61889e+053.01128e+02 -1.24317e+02 Pressure (bar) Constr. rmsd -2.18259e+000.0e+00 DD step 51879599 load imb.: force 43.7% At step 51879600 the performance loss due to force load imbalance is 17.5 % NOTE: Turning on dynamic load balancing DD step 5187 vol min/aver 0.643 load imb.: force 0.4% :: :: :: DD step 51884999 vol min/aver 0.647 load imb.: force 0.3% Step Time Lambda 51885000 103770.00.0 Energies (kJ/mol
Re: [gmx-users] Why does the -append option exist?
Hi, On Jun 4, 2011, at 19:11, Dimitar Pachov dpac...@brandeis.edu wrote: By the way, is this ever reviewed: Your mail to 'gmx-users' with the subject Re: [gmx-users] Why does the -append option exist? Is being held until the list moderator can review it for approval. This message usually comes when e.g. one sends mails larger than 50K which are eventually discarded. If you need to send big attachments post a download link instead. Cheers, Rossen On Fri, Jun 3, 2011 at 9:24 PM, Mark Abraham mark.abra...@anu.edu.au wrote: On 4/06/2011 8:26 AM, Dimitar Pachov wrote If this is true, then it wants fixing, and fast, and will get it :-) However, it would be surprising for such a problem to exist and not have been reported up to now. This feature has been in the code for a year now, and while some minor issues have been fixed since the 4.5 release, it would surprise me greatly if your claim was true. You're saying the equivalent of the steps below can occur: 1. Simulation wanders along normally and writes a checkpoint at step 1003 2. Random crash happens at step 1106 3. An -append restart from the old .tpr and the recent .cpt file will restart from step 1003 4. Random crash happens at step 1059 5. Now a restart doesn't restart from step 1003, but some other step and most importantly, the most important piece of data, that being the trajectory file, could be completely lost! I don't know the code behind the checkpointing appending, but I can see how easy one can overwrite 100ns trajectories, for example, and obtain the same trajectories of size 0. I don't see how easy that is, without a concrete example, where user error is not possible. Here is an example: [dpachov]$ ll -rth run1* \#run1* -rw-rw-r-- 1 dpachov dpachov 11K May 2 02:59 run1.po.mdp -rw-rw-r-- 1 dpachov dpachov 4.6K May 2 02:59 run1.grompp.out -rw-rw-r-- 1 dpachov dpachov 3.5M May 13 19:09 run1.gro -rw-rw-r-- 1 dpachov dpachov 2.3M May 14 00:40 run1.tpr -rw-rw-r-- 1 dpachov dpachov 2.3M May 14 00:40 run1-i.tpr -rw-rw-r-- 1 dpachov dpachov0 May 29 21:53 run1.trr -rw-rw-r-- 1 dpachov dpachov 1.2M May 31 10:45 run1.cpt -rw-rw-r-- 1 dpachov dpachov 1.2M May 31 10:45 run1_prev.cpt -rw-rw-r-- 1 dpachov dpachov0 Jun 3 14:03 run1.xtc -rw-rw-r-- 1 dpachov dpachov0 Jun 3 14:03 run1.edr -rw-rw-r-- 1 dpachov dpachov 15M Jun 3 17:03 run1.log Submitted by: ii=1 ifmpi=mpirun -np $NSLOTS if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile=md-${ii}-$k.out if [[ -f run${ii}.cpt ]]; then $ifmpi `which mdrun` -s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = From the end of run1.log: = Started mdrun on node 0 Tue May 31 10:28:52 2011 Step Time Lambda 51879390 103758.780000.0 Energies (kJ/mol) U-BProper Dih. Improper Dih. CMAP Dih. LJ-14 8.37521e+034.52303e+034.78633e+02 -1.23174e+032.87366e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 3.02277e+049.48267e+04 -3.88596e+03 -7.43902e+05 -8.36436e+04 PotentialKinetic En. Total EnergyTemperature Pres. DC (bar) -6.91359e+051.29016e+05 -5.62342e+053.00159e+02 -1.24746e+02 Pressure (bar) Constr. rmsd -2.43143e+000.0e+00 DD step 51879399 load imb.: force 225.5% Writing checkpoint, step 51879590 at Tue May 31 10:45:22 2011 --- Restarting from checkpoint, appending to previous log file. Log file opened on Fri Jun 3 17:03:20 2011 Host: compute-1-13.local pid: 337 nodeid: 0 nnodes: 8 The Gromacs distribution was built Tue Mar 22 09:26:37 EDT 2011 by dpachov@login-0-0.local (Linux 2.6.18-194.17.1.el5xen x86_64) ::: ::: ::: Grid: 13 x 15 x 11 cells Initial temperature: 301.137 K Started mdrun on node 0 Fri Jun 3 13:58:07 2011 Step Time Lambda 51879590 103759.180000.0 Energies (kJ/mol) U-BProper Dih. Improper Dih. CMAP Dih. LJ-14 8.47435e+034.61654e+033.99388e+02 -1.16765e+032.93920e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 2.99294e+049.42035e+04 -3.87927e+03 -7.43250e+05 -8.35872e+04 PotentialKinetic En. Total EnergyTemperature Pres. DC (bar) -6.91322e+051.29433e+05 -5.61889e+053.01128e+02 -1.24317e+02 Pressure (bar) Constr. rmsd -2.18259e+000.0e+00 DD step 51879599 load
Re: [gmx-users] Why does the -append option exist?
On Fri, 3 Jun 2011 18:26:17 -0400 Dimitar Pachov dpac...@brandeis.edu wrote: If one uses that option and the run is restarted and is again restarted before reaching the point of attempting to write a file, then things are lost, and most importantly, the most important piece of data, that being the trajectory file, could be completely lost! I don't know the code behind the checkpointing appending, but I can see how easy one can overwrite 100ns trajectories, for example, and obtain the same trajectories of size 0. So you are referring to the case where you have multiple, independent processes all using the same trajectory file. Yes, this will probably lead to problems, unless the trajectory file is somehow locked. So: does GROMACS lock the trajectory files it operates upon? If not, it should. -- Mr. Jussi Lehtola, M. Sc. Doctoral Student jussi.leht...@helsinki.fi Department of Physics http://www.helsinki.fi/~jzlehtol University of Helsinki Office phone: +358 9 191 50 632 Finland Jussi Lehtola, FM Tohtorikoulutettava jussi.leht...@helsinki.fi Fysiikan laitos http://www.helsinki.fi/~jzlehtol Helsingin Yliopisto Työpuhelin: (0)9 191 50 632 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Why does the -append option exist?
On 5/06/2011 3:11 AM, Dimitar Pachov wrote: On Fri, Jun 3, 2011 at 9:24 PM, Mark Abraham mark.abra...@anu.edu.au mailto:mark.abra...@anu.edu.au wrote: On 4/06/2011 8:26 AM, Dimitar Pachov wrote If this is true, then it wants fixing, and fast, and will get it :-) However, it would be surprising for such a problem to exist and not have been reported up to now. This feature has been in the code for a year now, and while some minor issues have been fixed since the 4.5 release, it would surprise me greatly if your claim was true. You're saying the equivalent of the steps below can occur: 1. Simulation wanders along normally and writes a checkpoint at step 1003 2. Random crash happens at step 1106 3. An -append restart from the old .tpr and the recent .cpt file will restart from step 1003 4. Random crash happens at step 1059 5. Now a restart doesn't restart from step 1003, but some other step and most importantly, the most important piece of data, that being the trajectory file, could be completely lost! I don't know the code behind the checkpointing appending, but I can see how easy one can overwrite 100ns trajectories, for example, and obtain the same trajectories of size 0. I don't see how easy that is, without a concrete example, where user error is not possible. Here is an example: [dpachov]$ ll -rth run1* \#run1* -rw-rw-r-- 1 dpachov dpachov 11K May 2 02:59 run1.po.mdp -rw-rw-r-- 1 dpachov dpachov 4.6K May 2 02:59 run1.grompp.out -rw-rw-r-- 1 dpachov dpachov 3.5M May 13 19:09 run1.gro -rw-rw-r-- 1 dpachov dpachov 2.3M May 14 00:40 run1.tpr -rw-rw-r-- 1 dpachov dpachov 2.3M May 14 00:40 run1-i.tpr -rw-rw-r-- 1 dpachov dpachov0 May 29 21:53 run1.trr -rw-rw-r-- 1 dpachov dpachov 1.2M May 31 10:45 run1.cpt -rw-rw-r-- 1 dpachov dpachov 1.2M May 31 10:45 run1_prev.cpt -rw-rw-r-- 1 dpachov dpachov0 Jun 3 14:03 run1.xtc -rw-rw-r-- 1 dpachov dpachov0 Jun 3 14:03 run1.edr -rw-rw-r-- 1 dpachov dpachov 15M Jun 3 17:03 run1.log Submitted by: ii=1 ifmpi=mpirun -np $NSLOTS if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile=md-${ii}-$k.out if [[ -f run${ii}.cpt ]]; then $ifmpi `which mdrun` -s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = This script is not using mdrun -append. Your original post suggested the use of -append was a problem. Why aren't we seeing a script with mdrun -append? Also, please provide the full script - it looks like there might be a loop around your tpbconv-then-mdrun fragment. Note that a useful trouble-shooting technique can be to construct your command line in a shell variable, echo it to stdout (redirected as suitable) and then execute the contents of the variable. Now, nobody has to parse a shell script to know what command line generated what output, and it can be co-located with the command's stdout. From the end of run1.log: = Started mdrun on node 0 Tue May 31 10:28:52 2011 Step Time Lambda 51879390 103758.780000.0 Energies (kJ/mol) U-BProper Dih. Improper Dih. CMAP Dih. LJ-14 8.37521e+034.52303e+034.78633e+02 -1.23174e+03 2.87366e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 3.02277e+049.48267e+04 -3.88596e+03 -7.43902e+05 -8.36436e+04 PotentialKinetic En. Total EnergyTemperature Pres. DC (bar) -6.91359e+051.29016e+05 -5.62342e+053.00159e+02 -1.24746e+02 Pressure (bar) Constr. rmsd -2.43143e+000.0e+00 DD step 51879399 load imb.: force 225.5% snip Writing checkpoint, step 51879590 at Tue May 31 10:45:22 2011 Energies (kJ/mol) U-BProper Dih. Improper Dih. CMAP Dih. LJ-14 8.33208e+034.72300e+035.31983e+02 -1.21532e+03 2.89586e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 3.00900e+049.31785e+04 -3.87790e+03 -7.40841e+05 -8.36838e+04 PotentialKinetic En. Total EnergyTemperature Pres. DC (bar) -6.89867e+051.28721e+05 -5.61146e+052.99472e+02 -1.24229e+02 Pressure (bar) Constr. rmsd -1.03491e+022.99840e-05 So the -append restart looks like it did fine here. Last output files from restarts: [dpachov]$ ll -rth md-1-*out | tail -10 -rw-rw-r-- 1 dpachov dpachov 6.1K Jun 3 16:40 md-1-2428.out -rw-rw-r-- 1 dpachov dpachov 6.2K Jun 3 16:44 md-1-2429.out -rw-rw-r-- 1 dpachov dpachov 5.9K Jun 3 16:46 md-1-2430.out
Re: [gmx-users] Why does the -append option exist?
On Sat, Jun 4, 2011 at 9:09 PM, Mark Abraham mark.abra...@anu.edu.auwrote: On 5/06/2011 3:11 AM, Dimitar Pachov wrote: On Fri, Jun 3, 2011 at 9:24 PM, Mark Abraham mark.abra...@anu.edu.au wrote: On 4/06/2011 8:26 AM, Dimitar Pachov wrote Here is an example: [dpachov]$ ll -rth run1* \#run1* -rw-rw-r-- 1 dpachov dpachov 11K May 2 02:59 run1.po.mdp -rw-rw-r-- 1 dpachov dpachov 4.6K May 2 02:59 run1.grompp.out -rw-rw-r-- 1 dpachov dpachov 3.5M May 13 19:09 run1.gro -rw-rw-r-- 1 dpachov dpachov 2.3M May 14 00:40 run1.tpr -rw-rw-r-- 1 dpachov dpachov 2.3M May 14 00:40 run1-i.tpr -rw-rw-r-- 1 dpachov dpachov0 May 29 21:53 run1.trr -rw-rw-r-- 1 dpachov dpachov 1.2M May 31 10:45 run1.cpt -rw-rw-r-- 1 dpachov dpachov 1.2M May 31 10:45 run1_prev.cpt -rw-rw-r-- 1 dpachov dpachov0 Jun 3 14:03 run1.xtc -rw-rw-r-- 1 dpachov dpachov0 Jun 3 14:03 run1.edr -rw-rw-r-- 1 dpachov dpachov 15M Jun 3 17:03 run1.log Submitted by: ii=1 ifmpi=mpirun -np $NSLOTS if [ ! -f run${ii}-i.tpr ];then cp run${ii}.tpr run${ii}-i.tpr tpbconv -s run${ii}-i.tpr -until 20 -o run${ii}.tpr fi k=`ls md-${ii}*.out | wc -l` outfile=md-${ii}-$k.out if [[ -f run${ii}.cpt ]]; then $ifmpi `which mdrun` -s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm run${ii} -npme 0 $outfile 21 fi = This script is not using mdrun -append. -append is the default, it doesn't need to be explicitly listed. Your original post suggested the use of -append was a problem. Why aren't we seeing a script with mdrun -append? Also, please provide the full script - it looks like there might be a loop around your tpbconv-then-mdrun fragment. There is no loop; this is a job script with PBS directives. The header of it looks like: === #!/bin/bash #$ -S /bin/bash #$ -pe mpich 8 #$ -ckpt reloc #$ -l mem_total=6G === as usual submitted by: qsub -N myjob.q Note that a useful trouble-shooting technique can be to construct your command line in a shell variable, echo it to stdout (redirected as suitable) and then execute the contents of the variable. Now, nobody has to parse a shell script to know what command line generated what output, and it can be co-located with the command's stdout. I somewhat understand your point, but could give an example if you think it is really necessary? snip Writing checkpoint, step 51879590 at Tue May 31 10:45:22 2011 Energies (kJ/mol) U-BProper Dih. Improper Dih. CMAP Dih. LJ-14 8.33208e+034.72300e+035.31983e+02 -1.21532e+032.89586e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 3.00900e+049.31785e+04 -3.87790e+03 -7.40841e+05 -8.36838e+04 PotentialKinetic En. Total EnergyTemperature Pres. DC (bar) -6.89867e+051.28721e+05 -5.61146e+052.99472e+02 -1.24229e+02 Pressure (bar) Constr. rmsd -1.03491e+022.99840e-05 So the -append restart looks like it did fine here. Last output files from restarts: [dpachov]$ ll -rth md-1-*out | tail -10 -rw-rw-r-- 1 dpachov dpachov 6.1K Jun 3 16:40 md-1-2428.out -rw-rw-r-- 1 dpachov dpachov 6.2K Jun 3 16:44 md-1-2429.out -rw-rw-r-- 1 dpachov dpachov 5.9K Jun 3 16:46 md-1-2430.out -rw-rw-r-- 1 dpachov dpachov 5.9K Jun 3 16:48 md-1-2431.out -rw-rw-r-- 1 dpachov dpachov 6.1K Jun 3 16:50 md-1-2432.out -rw-rw-r-- 1 dpachov dpachov0 Jun 3 16:52 md-1-2433.out -rw-rw-r-- 1 dpachov dpachov 6.2K Jun 3 16:55 md-1-2434.out -rw-rw-r-- 1 dpachov dpachov 6.2K Jun 3 16:58 md-1-2435.out -rw-rw-r-- 1 dpachov dpachov 5.9K Jun 3 17:03 md-1-2436.out *-rw-rw-r-- 1 dpachov dpachov 5.8K Jun 3 17:04 md-1-2437.out* + around the time when the run1.xtc file seems to have been saved: [dpachov]$ ll -rth md-1-23[5-6][0-9]*out -rw-rw-r-- 1 dpachov dpachov 6.2K Jun 3 13:37 md-1-2350.out -rw-rw-r-- 1 dpachov dpachov 6.1K Jun 3 13:39 md-1-2351.out -rw-rw-r-- 1 dpachov dpachov 6.2K Jun 3 13:43 md-1-2352.out -rw-rw-r-- 1 dpachov dpachov 6.2K Jun 3 13:45 md-1-2353.out -rw-rw-r-- 1 dpachov dpachov 5.9K Jun 3 13:46 md-1-2354.out -rw-rw-r-- 1 dpachov dpachov0 Jun 3 13:47 md-1-2355.out -rw-rw-r-- 1 dpachov dpachov 6.1K Jun 3 13:49 md-1-2356.out -rw-rw-r-- 1 dpachov dpachov 6.1K Jun 3 13:52 md-1-2357.out -rw-rw-r-- 1 dpachov dpachov 12K Jun 3 13:57 md-1-2358.out *-rw-rw-r-- 1 dpachov dpachov 12K Jun 3 14:02 md-1-2359.out* *-rw-rw-r-- 1 dpachov dpachov 6.0K Jun 3 14:03 md-1-2360.out* -rw-rw-r-- 1 dpachov dpachov 6.2K Jun 3 14:06 md-1-2361.out -rw-rw-r-- 1 dpachov dpachov 5.8K Jun 3 14:09 md-1-2362.out
Re: [gmx-users] Why does the -append option exist?
Well sometimes i run out of walltime when doing long simulations and append helps me not to do any file management after restarting simulations from the previous checkpoint. On Fri, Jun 3, 2011 at 3:26 PM, Dimitar Pachov dpac...@brandeis.edu wrote: At first, I thought the -append option of the mdrun command was great. However, I don't think it is anymore and have actually started questioning myself why it exists at the first place, and second, why has it become the default option in the newest versions? It is useless unless you run your simulations in a 100% safe from any unexpected problems (hardware, restarts, etc) mode, which is never the case. It is beyond me how such an option can become the default and how a statement like this: By default the output will be appending to the existing output files. The checkpoint file contains checksums of all output files, such that *you will never loose data when some output files are modified, corrupt or removed.* can be claimed without testing ALL of the scenarios that can lead to problems, that is, lost data. If one uses that option and the run is restarted and is again restarted before reaching the point of attempting to write a file, then things are lost, and most importantly, the most important piece of data, that being the trajectory file, could be completely lost! I don't know the code behind the checkpointing appending, but I can see how easy one can overwrite 100ns trajectories, for example, and obtain the same trajectories of size 0. Two restarts within a time frame where trajectory file is updated doesnt make sense. I really dont understand how you can loose trajectory files. Using the checkpoint capability appending make sense when many restarts are expected, but unfortunately it is exactly then when these options completely fail! How ? As a new user of Gromacs, I must say I am disappointed, and would like to obtain an explanation of why the usage of these options is clearly stated to be safe when it is not, and why the append option is the default, and why at least a single warning has not been posted anywhere in the docs manuals? Use -noappend Thanks, Dimitar -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Why does the -append option exist?
On 4/06/2011 8:26 AM, Dimitar Pachov wrote: At first, I thought the -append option of the mdrun command was great. However, I don't think it is anymore and have actually started questioning myself why it exists at the first place, and second, why has it become the default option in the newest versions? It exists because it used to be a pain to manage your simulation file numbering. It is useless unless you run your simulations in a 100% safe from any unexpected problems (hardware, restarts, etc) mode, which is never the case. It is beyond me how such an option can become the default and how a statement like this: By default the output will be appending to the existing output files. The checkpoint file contains checksums of all output files, such that *you will never loose data when some output files are modified, corrupt or removed.* can be claimed without testing ALL of the scenarios that can lead to problems, that is, lost data. The checkpoint file records the position of the output file pointers at the time of the checkpoint, along with an MD5 checksum. Upon restarting with -append, mdrun seeks to that file pointer position, verifies the checksum and issues a fatal error if this is not possible. So if checkpoint and other files are not altered or removed after a crash, then the method seems pretty safe to me. The above text mentions you are safe even if you remove files - that's an overstatement. However, I can't see that removing a non-checkpoint file could lead to loss of useful data from other non-checkpoint files. If one uses that option and the run is restarted and is again restarted before reaching the point of attempting to write a file, then things are lost, If this is true, then it wants fixing, and fast, and will get it :-) However, it would be surprising for such a problem to exist and not have been reported up to now. This feature has been in the code for a year now, and while some minor issues have been fixed since the 4.5 release, it would surprise me greatly if your claim was true. You're saying the equivalent of the steps below can occur: 1. Simulation wanders along normally and writes a checkpoint at step 1003 2. Random crash happens at step 1106 3. An -append restart from the old .tpr and the recent .cpt file will restart from step 1003 4. Random crash happens at step 1059 5. Now a restart doesn't restart from step 1003, but some other step and most importantly, the most important piece of data, that being the trajectory file, could be completely lost! I don't know the code behind the checkpointing appending, but I can see how easy one can overwrite 100ns trajectories, for example, and obtain the same trajectories of size 0. I don't see how easy that is, without a concrete example, where user error is not possible. Using the checkpoint capability appending make sense when many restarts are expected, but unfortunately it is exactly then when these options completely fail! As a new user of Gromacs, I must say I am disappointed, and would like to obtain an explanation of why the usage of these options is clearly stated to be safe when it is not, and why the append option is the default, and why at least a single warning has not been posted anywhere in the docs manuals? I can understand and sympathize with your frustration if you've experienced the loss of a simulation. Do be careful when suggesting that others' actions are blame-worthy, however. The developers all act in good faith on a largely volunteer basis. Errors in coding do happen, and they do get attention as developers' time permits. However, developers' time rarely permits addressing feature X doesn't work, why not? in a productive way. Solving bugs can be hard, but will be easier (and solved faster!) if the user who thinks a problem exists follows good procedure. See http://www.chiark.greenend.org.uk/~sgtatham/bugs.html http://www.chiark.greenend.org.uk/%7Esgtatham/bugs.html Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists