Re: [gmx-users] restart error

2016-06-23 Thread ingram

Great thank you!

On 2016-06-23 09:39, Mark Abraham wrote:

Hi,

There's two possibilities here.

1) GROMACS has a bug with multi-simulation checkpointing - several 
people
are reporting problems, and it's probably getting an overhaul for the 
2016
release because it was far from clear the old version was always 
working


2) Your (parallel) file system isn't working well, so that output 
files
that are reported to the old run of GROMACS as being flushed to disk 
are
actually not flushed to disk, so that when the old run GROMACS reads 
the
output files "from disk" to compute the checksum it gets lied to 
again.
This information gets written into the checkpoint file. That's OK if 
the

output file really gets written to disk later on, but sometimes this
doesn't happen, particularly upon some kind of failure such as loss 
of
power. You can diagnose this by looking at the modification times of 
e.g.

your .log files. Those of the first two replicas have probably been
modified 15 minutes before all the other ones, ie at the previous
checkpointing stage. If so, complain to your system admins.

You've truncated the error message there, but you can note that 
GROMACS is
merely refusing to do appending to the old files. You can make a 
backup of
your files and re-start with mdrun -noappend, but whatever 
information

didn't get written won't be available to you for subsequent analysis.

Mark

On Thu, Jun 23, 2016 at 2:34 AM ingram  
wrote:



Dear Grommunity,

When I try and restart with the command "mpiexec -np 192 mdrun_mpi 
-v

-deffnm md_golp_vacuo -s topol.tpr -cpi md_golp_vacuo.cpt -multidir
simann59 simann60 simann61 simann62 simann63 simann64 simann65 
simann66
simann67 simann68 simann69 simann70 simann71 simann72 simann73 
simann74"

I get the error " Fatal error: Can't read 187477 bytes of
'md_golp_vacuo.log' to compute checksum". I then see that the
simulations where this occurs are much behind the others, for 
example:


Step   Time Lambda
310031000.00.0
Step   Time Lambda
320032000.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0
Step   Time Lambda
575057500.00.0

I have already posted about this issue, and I thought I had made the
mistake. But I believe this to be a bug in GROMACS but please tell 
me if
this still seems like a user error and not GROMACS. I am using 
GROMACS

5.1.2

Best

Teresa

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users 
or

send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] restart error

2016-06-23 Thread Mark Abraham
Hi,

There's two possibilities here.

1) GROMACS has a bug with multi-simulation checkpointing - several people
are reporting problems, and it's probably getting an overhaul for the 2016
release because it was far from clear the old version was always working

2) Your (parallel) file system isn't working well, so that output files
that are reported to the old run of GROMACS as being flushed to disk are
actually not flushed to disk, so that when the old run GROMACS reads the
output files "from disk" to compute the checksum it gets lied to again.
This information gets written into the checkpoint file. That's OK if the
output file really gets written to disk later on, but sometimes this
doesn't happen, particularly upon some kind of failure such as loss of
power. You can diagnose this by looking at the modification times of e.g.
your .log files. Those of the first two replicas have probably been
modified 15 minutes before all the other ones, ie at the previous
checkpointing stage. If so, complain to your system admins.

You've truncated the error message there, but you can note that GROMACS is
merely refusing to do appending to the old files. You can make a backup of
your files and re-start with mdrun -noappend, but whatever information
didn't get written won't be available to you for subsequent analysis.

Mark

On Thu, Jun 23, 2016 at 2:34 AM ingram  wrote:

> Dear Grommunity,
>
> When I try and restart with the command "mpiexec -np 192 mdrun_mpi -v
> -deffnm md_golp_vacuo -s topol.tpr -cpi md_golp_vacuo.cpt -multidir
> simann59 simann60 simann61 simann62 simann63 simann64 simann65 simann66
> simann67 simann68 simann69 simann70 simann71 simann72 simann73 simann74"
> I get the error " Fatal error: Can't read 187477 bytes of
> 'md_golp_vacuo.log' to compute checksum". I then see that the
> simulations where this occurs are much behind the others, for example:
>
> Step   Time Lambda
> 310031000.00.0
> Step   Time Lambda
> 320032000.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
> Step   Time Lambda
> 575057500.00.0
>
> I have already posted about this issue, and I thought I had made the
> mistake. But I believe this to be a bug in GROMACS but please tell me if
> this still seems like a user error and not GROMACS. I am using GROMACS
> 5.1.2
>
> Best
>
> Teresa
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.