Dear Users:

I have 50 simulations that are all the same, except with different random seeds for velocities. All were running fine for 24 hours. I canceled the running jobs and resubmitted them as part of beta testing a new cluster. All 50 started. I then canceled one of these jobs soon after starting it and then started it again pretty quickly (possibly too quickly). This restart now gave me the error:

Fatal error:
Failed to lock: continue.log. Already running simulation?
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

I found this post about this possibly being related to the Lustre filesystem:
http://lists.gromacs.org/pipermail/gmx-users/2010-November/056173.html

But I am not sure how to figure out if that is being used. Here is the output from mount:
[nealechr@ip13-mp2 50]$ mount
/dev/mapper/hddvg-root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md0 on /boot type ext4 (rw)
/dev/mapper/hddvg-home on /home type ext4 (rw,usrquota,grpquota)
/dev/md2 on /ltmp type ext4 (rw)
/dev/mapper/hddvg-opt on /opt type ext4 (rw)
none on /ramdisk type tmpfs (rw,nosuid,nodev)
none on /var/tmp type tmpfs (rw,noexec,nosuid,nodev,size=1000000000)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
none on /ipathfs type ipathfs (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
none on /tmp type tmpfs (rw,noexec,nosuid,nodev,size=1000000000)
10.4.215.201@o2ib:/lustre01 on /mnt/scratch01 type lustre (rw,_netdev,flock)

Also, it seems unlikely to be system related because the other 49 runs are going just fine. I did a ls -la to see if there was some hidden file to indicate the lock but could not find any (I have no idea how such a lock would work or be detected).

I deleted the .log file, but then I get the error:

Fatal error:
File appending requested, but only 3 of the 4 output files are present

Moving everything to a new directory and then copying it back (including the original .log file) allowed me to run the simulation.

Did I do something incorrectly, or is this a bona-fide problem?

Thank you,
Chris.


--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to