Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-23 Thread Mark Abraham
Hi,

That depends how your mpi is implemented, but what you really want is a
filesystem visible on each node. But since mpirun gmx_mpi mdrun is working,
then it's fine.

Mark
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-23 Thread Husen R
Hi,

I'm wondering, if I use gromacs in cluster environment, do I have to
install gromacs in every nodes (at /usr/local/gromacs in every nodes) ?
or is it enough to install gromacs in one node (example,in head-node) only
?

Regards,

Husen



On Thu, Jun 23, 2016 at 3:41 PM, Mark Abraham 
wrote:

> Hi,
>
> The only explanation is that that file is not in fact properly accessible
> if rank 0 is placed other than on "compute-node," which means your
> organization of file system / slurm / etc. aren't good enough for what
> you're doing.
>
> Mark
>
> On Thu, Jun 23, 2016 at 10:15 AM Husen R  wrote:
>
> > Hi,
> >
> > I still unable to find out the cause of the fatal error.
> > Previously, gromacs is installed in every nodes. That's the cause Build
> > time mismatch and Build user mismatch appeared.
> > Now, Build time mismatch and Build user mismatch issues are solved by
> > installing Gromacs in shared directory.
> >
> > I have tried to install gromacs in one node only (not in shared
> directory),
> > but the error appeared.
> >
> >
> > this is the error message if I exclude compute-node
> > "--exclude=compute-node" from nodelist in slurm sbatch. excluding other
> > nodes works fine.
> >
> >
> >
> >
> =
> > GROMACS:  gmx mdrun, VERSION 5.1.2
> > Executable:   /mirror/source/gromacs/bin/gmx_mpi
> > Data prefix:  /mirror/source/gromacs
> > Command line:
> >   gmx_mpi mdrun -cpi md_gmx.cpt -deffnm md_gmx
> >
> >
> > Running on 2 nodes with total 8 cores, 16 logical cores
> >   Cores per node:4
> >   Logical cores per node:8
> > Hardware detected on host head-node (the node of MPI rank 0):
> >   CPU info:
> > Vendor: GenuineIntel
> > Brand:  Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
> > SIMD instructions most likely to fit this hardware: AVX_256
> > SIMD instructions selected at GROMACS compile time: AVX_256
> >
> > Reading file md_gmx.tpr, VERSION 5.1.2 (single precision)
> > Changing nstlist from 10 to 20, rlist from 1 to 1.03
> >
> >
> > Reading checkpoint file md_gmx.cpt generated: Thu Jun 23 12:54:02 2016
> >
> >
> >   #ranks mismatch,
> > current program: 16
> > checkpoint file: 24
> >
> >   #PME-ranks mismatch,
> > current program: -1
> > checkpoint file: 6
> >
> > GROMACS patchlevel, binary or parallel settings differ from previous run.
> > Continuation is exact, but not guaranteed to be binary identical.
> >
> >
> > ---
> > Program gmx mdrun, VERSION 5.1.2
> > Source code file:
> >
> /home/necis/gromacsinstall/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp,
> > line: 2216
> >
> > Fatal error:
> > Truncation of file md_gmx.xtc failed. Cannot do appending because of this
> > failure.
> > For more information and tips for troubleshooting, please check the
> GROMACS
> > website at http://www.gromacs.org/Documentation/Errors
> >
> >
> 
> >
> > On Thu, Jun 16, 2016 at 6:23 PM, Mark Abraham 
> > wrote:
> >
> > > Hi,
> > >
> > > On Thu, Jun 16, 2016 at 12:24 PM Husen R  wrote:
> > >
> > > > On Thu, Jun 16, 2016 at 4:01 PM, Mark Abraham <
> > mark.j.abra...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > There's just nothing special about any node at run time.
> > > > >
> > > > > Your script looks like it is building GROMACS fresh each time -
> > there's
> > > > no
> > > > > need to do that,
> > > >
> > > >
> > > > which part of my script ?
> > > >
> > >
> > > I can't tell how your script is finding its GROMACS installations, but
> > the
> > > advisory message says precisely that your runs are finding different
> > > installations...
> > >
> > >   Build time mismatch,
> > > current program: Sel Apr  5 13:37:32 WIB 2016
> > > checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> > >
> > >   Build user mismatch,
> > > current program: pro@head-node [CMAKE]
> > > checkpoint file: pro@compute-node [CMAKE]
> > >
> > > This reinforces my impression that the view of your file system
> available
> > > at the start of the job script is varying with your choice of node
> > subsets.
> > >
> > >
> > > > I always use this command to restart from checkpoint file -->
> "mpirun
> > > > gmx_mpi mdrun -cpi [name].cpt -deffnm [name]".
> > > > as far as I know -cpi option is used to refer to checkpoint file as
> > input
> > > > file.
> > > >  what I have to change in my script ?
> > > >
> > >
> > > Nothing about that aspect. But clearly your first run and the restart
> > > simulating loss of a node are finding different gmx_mpi binaries from
> > their
> > > respective environments. This is not itself a problem, but it's
> probably
> > > not what you intend, and may be symptomatic of the same issue that
> leads
> > to
> > > 

Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-23 Thread Mark Abraham
Hi,

The only explanation is that that file is not in fact properly accessible
if rank 0 is placed other than on "compute-node," which means your
organization of file system / slurm / etc. aren't good enough for what
you're doing.

Mark

On Thu, Jun 23, 2016 at 10:15 AM Husen R  wrote:

> Hi,
>
> I still unable to find out the cause of the fatal error.
> Previously, gromacs is installed in every nodes. That's the cause Build
> time mismatch and Build user mismatch appeared.
> Now, Build time mismatch and Build user mismatch issues are solved by
> installing Gromacs in shared directory.
>
> I have tried to install gromacs in one node only (not in shared directory),
> but the error appeared.
>
>
> this is the error message if I exclude compute-node
> "--exclude=compute-node" from nodelist in slurm sbatch. excluding other
> nodes works fine.
>
>
>
> =
> GROMACS:  gmx mdrun, VERSION 5.1.2
> Executable:   /mirror/source/gromacs/bin/gmx_mpi
> Data prefix:  /mirror/source/gromacs
> Command line:
>   gmx_mpi mdrun -cpi md_gmx.cpt -deffnm md_gmx
>
>
> Running on 2 nodes with total 8 cores, 16 logical cores
>   Cores per node:4
>   Logical cores per node:8
> Hardware detected on host head-node (the node of MPI rank 0):
>   CPU info:
> Vendor: GenuineIntel
> Brand:  Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
> SIMD instructions most likely to fit this hardware: AVX_256
> SIMD instructions selected at GROMACS compile time: AVX_256
>
> Reading file md_gmx.tpr, VERSION 5.1.2 (single precision)
> Changing nstlist from 10 to 20, rlist from 1 to 1.03
>
>
> Reading checkpoint file md_gmx.cpt generated: Thu Jun 23 12:54:02 2016
>
>
>   #ranks mismatch,
> current program: 16
> checkpoint file: 24
>
>   #PME-ranks mismatch,
> current program: -1
> checkpoint file: 6
>
> GROMACS patchlevel, binary or parallel settings differ from previous run.
> Continuation is exact, but not guaranteed to be binary identical.
>
>
> ---
> Program gmx mdrun, VERSION 5.1.2
> Source code file:
> /home/necis/gromacsinstall/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp,
> line: 2216
>
> Fatal error:
> Truncation of file md_gmx.xtc failed. Cannot do appending because of this
> failure.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
>
> 
>
> On Thu, Jun 16, 2016 at 6:23 PM, Mark Abraham 
> wrote:
>
> > Hi,
> >
> > On Thu, Jun 16, 2016 at 12:24 PM Husen R  wrote:
> >
> > > On Thu, Jun 16, 2016 at 4:01 PM, Mark Abraham <
> mark.j.abra...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > There's just nothing special about any node at run time.
> > > >
> > > > Your script looks like it is building GROMACS fresh each time -
> there's
> > > no
> > > > need to do that,
> > >
> > >
> > > which part of my script ?
> > >
> >
> > I can't tell how your script is finding its GROMACS installations, but
> the
> > advisory message says precisely that your runs are finding different
> > installations...
> >
> >   Build time mismatch,
> > current program: Sel Apr  5 13:37:32 WIB 2016
> > checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> >
> >   Build user mismatch,
> > current program: pro@head-node [CMAKE]
> > checkpoint file: pro@compute-node [CMAKE]
> >
> > This reinforces my impression that the view of your file system available
> > at the start of the job script is varying with your choice of node
> subsets.
> >
> >
> > > I always use this command to restart from checkpoint file -->  "mpirun
> > > gmx_mpi mdrun -cpi [name].cpt -deffnm [name]".
> > > as far as I know -cpi option is used to refer to checkpoint file as
> input
> > > file.
> > >  what I have to change in my script ?
> > >
> >
> > Nothing about that aspect. But clearly your first run and the restart
> > simulating loss of a node are finding different gmx_mpi binaries from
> their
> > respective environments. This is not itself a problem, but it's probably
> > not what you intend, and may be symptomatic of the same issue that leads
> to
> > md_test.xtc not being accessible.
> >
> > Mark
> >
> >
> > >
> > > but the fact that the node name is showing up in the check
> > > > that takes place when the checkpoint is read is not relevant to the
> > > > problem.
> > > >
> > > > Mark
> > > >
> > > > On Thu, Jun 16, 2016 at 9:46 AM Husen R  wrote:
> > > >
> > > > > On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham <
> > > mark.j.abra...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Thu, Jun 16, 2016 at 9:30 AM Husen R 
> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > 

Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-23 Thread Husen R
Hi,

I still unable to find out the cause of the fatal error.
Previously, gromacs is installed in every nodes. That's the cause Build
time mismatch and Build user mismatch appeared.
Now, Build time mismatch and Build user mismatch issues are solved by
installing Gromacs in shared directory.

I have tried to install gromacs in one node only (not in shared directory),
but the error appeared.


this is the error message if I exclude compute-node
"--exclude=compute-node" from nodelist in slurm sbatch. excluding other
nodes works fine.


=
GROMACS:  gmx mdrun, VERSION 5.1.2
Executable:   /mirror/source/gromacs/bin/gmx_mpi
Data prefix:  /mirror/source/gromacs
Command line:
  gmx_mpi mdrun -cpi md_gmx.cpt -deffnm md_gmx


Running on 2 nodes with total 8 cores, 16 logical cores
  Cores per node:4
  Logical cores per node:8
Hardware detected on host head-node (the node of MPI rank 0):
  CPU info:
Vendor: GenuineIntel
Brand:  Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
SIMD instructions most likely to fit this hardware: AVX_256
SIMD instructions selected at GROMACS compile time: AVX_256

Reading file md_gmx.tpr, VERSION 5.1.2 (single precision)
Changing nstlist from 10 to 20, rlist from 1 to 1.03


Reading checkpoint file md_gmx.cpt generated: Thu Jun 23 12:54:02 2016


  #ranks mismatch,
current program: 16
checkpoint file: 24

  #PME-ranks mismatch,
current program: -1
checkpoint file: 6

GROMACS patchlevel, binary or parallel settings differ from previous run.
Continuation is exact, but not guaranteed to be binary identical.


---
Program gmx mdrun, VERSION 5.1.2
Source code file:
/home/necis/gromacsinstall/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp,
line: 2216

Fatal error:
Truncation of file md_gmx.xtc failed. Cannot do appending because of this
failure.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors


On Thu, Jun 16, 2016 at 6:23 PM, Mark Abraham 
wrote:

> Hi,
>
> On Thu, Jun 16, 2016 at 12:24 PM Husen R  wrote:
>
> > On Thu, Jun 16, 2016 at 4:01 PM, Mark Abraham 
> > wrote:
> >
> > > Hi,
> > >
> > > There's just nothing special about any node at run time.
> > >
> > > Your script looks like it is building GROMACS fresh each time - there's
> > no
> > > need to do that,
> >
> >
> > which part of my script ?
> >
>
> I can't tell how your script is finding its GROMACS installations, but the
> advisory message says precisely that your runs are finding different
> installations...
>
>   Build time mismatch,
> current program: Sel Apr  5 13:37:32 WIB 2016
> checkpoint file: Rab Apr  6 09:44:51 WIB 2016
>
>   Build user mismatch,
> current program: pro@head-node [CMAKE]
> checkpoint file: pro@compute-node [CMAKE]
>
> This reinforces my impression that the view of your file system available
> at the start of the job script is varying with your choice of node subsets.
>
>
> > I always use this command to restart from checkpoint file -->  "mpirun
> > gmx_mpi mdrun -cpi [name].cpt -deffnm [name]".
> > as far as I know -cpi option is used to refer to checkpoint file as input
> > file.
> >  what I have to change in my script ?
> >
>
> Nothing about that aspect. But clearly your first run and the restart
> simulating loss of a node are finding different gmx_mpi binaries from their
> respective environments. This is not itself a problem, but it's probably
> not what you intend, and may be symptomatic of the same issue that leads to
> md_test.xtc not being accessible.
>
> Mark
>
>
> >
> > but the fact that the node name is showing up in the check
> > > that takes place when the checkpoint is read is not relevant to the
> > > problem.
> > >
> > > Mark
> > >
> > > On Thu, Jun 16, 2016 at 9:46 AM Husen R  wrote:
> > >
> > > > On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham <
> > mark.j.abra...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Jun 16, 2016 at 9:30 AM Husen R  wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thank you for your reply !
> > > > > >
> > > > > > md_test.xtc is exist and writable.
> > > > > >
> > > > >
> > > > > OK, but it needs to be seen that way from the set of compute nodes
> > you
> > > > are
> > > > > using, and organizing that is up to you and your job scheduler,
> etc.
> > > > >
> > > > >
> > > > > > I tried to restart from checkpoint file by excluding other node
> > than
> > > > > > compute-node and it works.
> > > > > >
> > > > >
> > > > > Go do that, then :-)
> > > > >
> > > >
> > > > I'm building a simple system that can respond to node failure. if
> > failure

Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-16 Thread Mark Abraham
Hi,

On Thu, Jun 16, 2016 at 12:24 PM Husen R  wrote:

> On Thu, Jun 16, 2016 at 4:01 PM, Mark Abraham 
> wrote:
>
> > Hi,
> >
> > There's just nothing special about any node at run time.
> >
> > Your script looks like it is building GROMACS fresh each time - there's
> no
> > need to do that,
>
>
> which part of my script ?
>

I can't tell how your script is finding its GROMACS installations, but the
advisory message says precisely that your runs are finding different
installations...

  Build time mismatch,
current program: Sel Apr  5 13:37:32 WIB 2016
checkpoint file: Rab Apr  6 09:44:51 WIB 2016

  Build user mismatch,
current program: pro@head-node [CMAKE]
checkpoint file: pro@compute-node [CMAKE]

This reinforces my impression that the view of your file system available
at the start of the job script is varying with your choice of node subsets.


> I always use this command to restart from checkpoint file -->  "mpirun
> gmx_mpi mdrun -cpi [name].cpt -deffnm [name]".
> as far as I know -cpi option is used to refer to checkpoint file as input
> file.
>  what I have to change in my script ?
>

Nothing about that aspect. But clearly your first run and the restart
simulating loss of a node are finding different gmx_mpi binaries from their
respective environments. This is not itself a problem, but it's probably
not what you intend, and may be symptomatic of the same issue that leads to
md_test.xtc not being accessible.

Mark


>
> but the fact that the node name is showing up in the check
> > that takes place when the checkpoint is read is not relevant to the
> > problem.
> >
> > Mark
> >
> > On Thu, Jun 16, 2016 at 9:46 AM Husen R  wrote:
> >
> > > On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham <
> mark.j.abra...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > On Thu, Jun 16, 2016 at 9:30 AM Husen R  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thank you for your reply !
> > > > >
> > > > > md_test.xtc is exist and writable.
> > > > >
> > > >
> > > > OK, but it needs to be seen that way from the set of compute nodes
> you
> > > are
> > > > using, and organizing that is up to you and your job scheduler, etc.
> > > >
> > > >
> > > > > I tried to restart from checkpoint file by excluding other node
> than
> > > > > compute-node and it works.
> > > > >
> > > >
> > > > Go do that, then :-)
> > > >
> > >
> > > I'm building a simple system that can respond to node failure. if
> failure
> > > occured on node A, than the application has to be restarted and that
> node
> > > has to be excluded.
> > > this should apply to all node including this 'compute-node'.
> > >
> > > >
> > > >
> > > > > only '--exclude=compute-node' that produces this error.
> > > > >
> > > >
> > > > Then there's something about that node that is special with respect
> to
> > > the
> > > > file system - there's nothing about any particular node that GROMACS
> > > cares
> > > > about.
> > > >
> > >
> > > > Mark
> > > >
> > > >
> > > > > is this has the same issue with this thread ?
> > > > > http://comments.gmane.org/gmane.science.biology.gromacs.user/40984
> > > > >
> > > > > regards,
> > > > >
> > > > > Husen
> > > > >
> > > > > On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham <
> > > mark.j.abra...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > The stuff about different nodes or numbers of nodes doesn't
> matter
> > -
> > > > it's
> > > > > > merely an advisory note from mdrun. mdrun failed when it tried to
> > > > operate
> > > > > > upon md_test.xtc, so perhaps you need to consider whether the
> file
> > > > > exists,
> > > > > > is writable, etc.
> > > > > >
> > > > > > Mark
> > > > > >
> > > > > > On Thu, Jun 16, 2016 at 6:48 AM Husen R 
> wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I got the following error message when I tried to restart
> gromacs
> > > > > > > simulation from checkpoint file.
> > > > > > > I restart the simulation using fewer nodes and processes, and
> > also
> > > I
> > > > > > > exclude one node using '--exclude=' option (in slurm) for
> > > > experimental
> > > > > > > purpose.
> > > > > > >
> > > > > > > I'm sure fewer nodes and processes are not the cause of this
> > error
> > > > as I
> > > > > > > already test that.
> > > > > > > I have checked that the cause of this error is '--exclude='
> > usage.
> > > I
> > > > > > > excluded 1 node named 'compute-node' when restart from
> checkpoint
> > > (at
> > > > > > first
> > > > > > > run, I use all node including 'compute-node').
> > > > > > >
> > > > > > >
> > > > > > > it seems that at first run, the submit job script was built at
> > > > > > > compute-node. So, at restart, build user mismatch appeared
> > because
> > > > > > > compute-node was not found (excluded).
> > > > > > >
> > > > > > > Am I right ? is this behavior normal ?
> > > > > > > or is that a way to avoid this, so I can 

Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-16 Thread Husen R
On Thu, Jun 16, 2016 at 4:01 PM, Mark Abraham 
wrote:

> Hi,
>
> There's just nothing special about any node at run time.
>
> Your script looks like it is building GROMACS fresh each time - there's no
> need to do that,


which part of my script ?
I always use this command to restart from checkpoint file -->  "mpirun
gmx_mpi mdrun -cpi [name].cpt -deffnm [name]".
as far as I know -cpi option is used to refer to checkpoint file as input
file.
 what I have to change in my script ?


but the fact that the node name is showing up in the check
> that takes place when the checkpoint is read is not relevant to the
> problem.
>
> Mark
>
> On Thu, Jun 16, 2016 at 9:46 AM Husen R  wrote:
>
> > On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham 
> > wrote:
> >
> > > Hi,
> > >
> > > On Thu, Jun 16, 2016 at 9:30 AM Husen R  wrote:
> > >
> > > > Hi,
> > > >
> > > > Thank you for your reply !
> > > >
> > > > md_test.xtc is exist and writable.
> > > >
> > >
> > > OK, but it needs to be seen that way from the set of compute nodes you
> > are
> > > using, and organizing that is up to you and your job scheduler, etc.
> > >
> > >
> > > > I tried to restart from checkpoint file by excluding other node than
> > > > compute-node and it works.
> > > >
> > >
> > > Go do that, then :-)
> > >
> >
> > I'm building a simple system that can respond to node failure. if failure
> > occured on node A, than the application has to be restarted and that node
> > has to be excluded.
> > this should apply to all node including this 'compute-node'.
> >
> > >
> > >
> > > > only '--exclude=compute-node' that produces this error.
> > > >
> > >
> > > Then there's something about that node that is special with respect to
> > the
> > > file system - there's nothing about any particular node that GROMACS
> > cares
> > > about.
> > >
> >
> > > Mark
> > >
> > >
> > > > is this has the same issue with this thread ?
> > > > http://comments.gmane.org/gmane.science.biology.gromacs.user/40984
> > > >
> > > > regards,
> > > >
> > > > Husen
> > > >
> > > > On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham <
> > mark.j.abra...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The stuff about different nodes or numbers of nodes doesn't matter
> -
> > > it's
> > > > > merely an advisory note from mdrun. mdrun failed when it tried to
> > > operate
> > > > > upon md_test.xtc, so perhaps you need to consider whether the file
> > > > exists,
> > > > > is writable, etc.
> > > > >
> > > > > Mark
> > > > >
> > > > > On Thu, Jun 16, 2016 at 6:48 AM Husen R  wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I got the following error message when I tried to restart gromacs
> > > > > > simulation from checkpoint file.
> > > > > > I restart the simulation using fewer nodes and processes, and
> also
> > I
> > > > > > exclude one node using '--exclude=' option (in slurm) for
> > > experimental
> > > > > > purpose.
> > > > > >
> > > > > > I'm sure fewer nodes and processes are not the cause of this
> error
> > > as I
> > > > > > already test that.
> > > > > > I have checked that the cause of this error is '--exclude='
> usage.
> > I
> > > > > > excluded 1 node named 'compute-node' when restart from checkpoint
> > (at
> > > > > first
> > > > > > run, I use all node including 'compute-node').
> > > > > >
> > > > > >
> > > > > > it seems that at first run, the submit job script was built at
> > > > > > compute-node. So, at restart, build user mismatch appeared
> because
> > > > > > compute-node was not found (excluded).
> > > > > >
> > > > > > Am I right ? is this behavior normal ?
> > > > > > or is that a way to avoid this, so I can freely restart from
> > > checkpoint
> > > > > > using any nodes without limitation.
> > > > > >
> > > > > > thank you in advance
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > >
> > > > > > Husen
> > > > > >
> > > > > > ==restart script=
> > > > > > #!/bin/bash
> > > > > > #SBATCH -J ayo
> > > > > > #SBATCH -o md%j.out
> > > > > > #SBATCH -A necis
> > > > > > #SBATCH -N 2
> > > > > > #SBATCH -n 16
> > > > > > #SBATCH --exclude=compute-node
> > > > > > #SBATCH --time=144:00:00
> > > > > > #SBATCH --mail-user=hus...@gmail.com
> > > > > > #SBATCH --mail-type=begin
> > > > > > #SBATCH --mail-type=end
> > > > > >
> > > > > > mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> > > > > > =
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > ==output
> > > error
> > > > > > Reading checkpoint file md_test.cpt generated: Wed Jun 15
> 16:30:44
> > > 2016
> > > > > >
> > > > > >
> > > > > >   Build time mismatch,
> > > > > > current program: Sel Apr  5 13:37:32 WIB 2016
> > > > > > checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> > > > > >
> > > > > >   Build user 

Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-16 Thread Mark Abraham
Hi,

There's just nothing special about any node at run time.

Your script looks like it is building GROMACS fresh each time - there's no
need to do that, but the fact that the node name is showing up in the check
that takes place when the checkpoint is read is not relevant to the problem.

Mark

On Thu, Jun 16, 2016 at 9:46 AM Husen R  wrote:

> On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham 
> wrote:
>
> > Hi,
> >
> > On Thu, Jun 16, 2016 at 9:30 AM Husen R  wrote:
> >
> > > Hi,
> > >
> > > Thank you for your reply !
> > >
> > > md_test.xtc is exist and writable.
> > >
> >
> > OK, but it needs to be seen that way from the set of compute nodes you
> are
> > using, and organizing that is up to you and your job scheduler, etc.
> >
> >
> > > I tried to restart from checkpoint file by excluding other node than
> > > compute-node and it works.
> > >
> >
> > Go do that, then :-)
> >
>
> I'm building a simple system that can respond to node failure. if failure
> occured on node A, than the application has to be restarted and that node
> has to be excluded.
> this should apply to all node including this 'compute-node'.
>
> >
> >
> > > only '--exclude=compute-node' that produces this error.
> > >
> >
> > Then there's something about that node that is special with respect to
> the
> > file system - there's nothing about any particular node that GROMACS
> cares
> > about.
> >
>
> > Mark
> >
> >
> > > is this has the same issue with this thread ?
> > > http://comments.gmane.org/gmane.science.biology.gromacs.user/40984
> > >
> > > regards,
> > >
> > > Husen
> > >
> > > On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham <
> mark.j.abra...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > The stuff about different nodes or numbers of nodes doesn't matter -
> > it's
> > > > merely an advisory note from mdrun. mdrun failed when it tried to
> > operate
> > > > upon md_test.xtc, so perhaps you need to consider whether the file
> > > exists,
> > > > is writable, etc.
> > > >
> > > > Mark
> > > >
> > > > On Thu, Jun 16, 2016 at 6:48 AM Husen R  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I got the following error message when I tried to restart gromacs
> > > > > simulation from checkpoint file.
> > > > > I restart the simulation using fewer nodes and processes, and also
> I
> > > > > exclude one node using '--exclude=' option (in slurm) for
> > experimental
> > > > > purpose.
> > > > >
> > > > > I'm sure fewer nodes and processes are not the cause of this error
> > as I
> > > > > already test that.
> > > > > I have checked that the cause of this error is '--exclude=' usage.
> I
> > > > > excluded 1 node named 'compute-node' when restart from checkpoint
> (at
> > > > first
> > > > > run, I use all node including 'compute-node').
> > > > >
> > > > >
> > > > > it seems that at first run, the submit job script was built at
> > > > > compute-node. So, at restart, build user mismatch appeared because
> > > > > compute-node was not found (excluded).
> > > > >
> > > > > Am I right ? is this behavior normal ?
> > > > > or is that a way to avoid this, so I can freely restart from
> > checkpoint
> > > > > using any nodes without limitation.
> > > > >
> > > > > thank you in advance
> > > > >
> > > > > Regards,
> > > > >
> > > > >
> > > > > Husen
> > > > >
> > > > > ==restart script=
> > > > > #!/bin/bash
> > > > > #SBATCH -J ayo
> > > > > #SBATCH -o md%j.out
> > > > > #SBATCH -A necis
> > > > > #SBATCH -N 2
> > > > > #SBATCH -n 16
> > > > > #SBATCH --exclude=compute-node
> > > > > #SBATCH --time=144:00:00
> > > > > #SBATCH --mail-user=hus...@gmail.com
> > > > > #SBATCH --mail-type=begin
> > > > > #SBATCH --mail-type=end
> > > > >
> > > > > mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> > > > > =
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ==output
> > error
> > > > > Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44
> > 2016
> > > > >
> > > > >
> > > > >   Build time mismatch,
> > > > > current program: Sel Apr  5 13:37:32 WIB 2016
> > > > > checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> > > > >
> > > > >   Build user mismatch,
> > > > > current program: pro@head-node [CMAKE]
> > > > > checkpoint file: pro@compute-node [CMAKE]
> > > > >
> > > > >   #ranks mismatch,
> > > > > current program: 16
> > > > > checkpoint file: 24
> > > > >
> > > > >   #PME-ranks mismatch,
> > > > > current program: -1
> > > > > checkpoint file: 6
> > > > >
> > > > > GROMACS patchlevel, binary or parallel settings differ from
> previous
> > > run.
> > > > > Continuation is exact, but not guaranteed to be binary identical.
> > > > >
> > > > >
> > > > > ---
> > > > > Program gmx mdrun, VERSION 5.1.2
> > > > > 

Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-16 Thread Husen R
On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham 
wrote:

> Hi,
>
> On Thu, Jun 16, 2016 at 9:30 AM Husen R  wrote:
>
> > Hi,
> >
> > Thank you for your reply !
> >
> > md_test.xtc is exist and writable.
> >
>
> OK, but it needs to be seen that way from the set of compute nodes you are
> using, and organizing that is up to you and your job scheduler, etc.
>
>
> > I tried to restart from checkpoint file by excluding other node than
> > compute-node and it works.
> >
>
> Go do that, then :-)
>

I'm building a simple system that can respond to node failure. if failure
occured on node A, than the application has to be restarted and that node
has to be excluded.
this should apply to all node including this 'compute-node'.

>
>
> > only '--exclude=compute-node' that produces this error.
> >
>
> Then there's something about that node that is special with respect to the
> file system - there's nothing about any particular node that GROMACS cares
> about.
>

> Mark
>
>
> > is this has the same issue with this thread ?
> > http://comments.gmane.org/gmane.science.biology.gromacs.user/40984
> >
> > regards,
> >
> > Husen
> >
> > On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham 
> > wrote:
> >
> > > Hi,
> > >
> > > The stuff about different nodes or numbers of nodes doesn't matter -
> it's
> > > merely an advisory note from mdrun. mdrun failed when it tried to
> operate
> > > upon md_test.xtc, so perhaps you need to consider whether the file
> > exists,
> > > is writable, etc.
> > >
> > > Mark
> > >
> > > On Thu, Jun 16, 2016 at 6:48 AM Husen R  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I got the following error message when I tried to restart gromacs
> > > > simulation from checkpoint file.
> > > > I restart the simulation using fewer nodes and processes, and also I
> > > > exclude one node using '--exclude=' option (in slurm) for
> experimental
> > > > purpose.
> > > >
> > > > I'm sure fewer nodes and processes are not the cause of this error
> as I
> > > > already test that.
> > > > I have checked that the cause of this error is '--exclude=' usage. I
> > > > excluded 1 node named 'compute-node' when restart from checkpoint (at
> > > first
> > > > run, I use all node including 'compute-node').
> > > >
> > > >
> > > > it seems that at first run, the submit job script was built at
> > > > compute-node. So, at restart, build user mismatch appeared because
> > > > compute-node was not found (excluded).
> > > >
> > > > Am I right ? is this behavior normal ?
> > > > or is that a way to avoid this, so I can freely restart from
> checkpoint
> > > > using any nodes without limitation.
> > > >
> > > > thank you in advance
> > > >
> > > > Regards,
> > > >
> > > >
> > > > Husen
> > > >
> > > > ==restart script=
> > > > #!/bin/bash
> > > > #SBATCH -J ayo
> > > > #SBATCH -o md%j.out
> > > > #SBATCH -A necis
> > > > #SBATCH -N 2
> > > > #SBATCH -n 16
> > > > #SBATCH --exclude=compute-node
> > > > #SBATCH --time=144:00:00
> > > > #SBATCH --mail-user=hus...@gmail.com
> > > > #SBATCH --mail-type=begin
> > > > #SBATCH --mail-type=end
> > > >
> > > > mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> > > > =
> > > >
> > > >
> > > >
> > > >
> > > > ==output
> error
> > > > Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44
> 2016
> > > >
> > > >
> > > >   Build time mismatch,
> > > > current program: Sel Apr  5 13:37:32 WIB 2016
> > > > checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> > > >
> > > >   Build user mismatch,
> > > > current program: pro@head-node [CMAKE]
> > > > checkpoint file: pro@compute-node [CMAKE]
> > > >
> > > >   #ranks mismatch,
> > > > current program: 16
> > > > checkpoint file: 24
> > > >
> > > >   #PME-ranks mismatch,
> > > > current program: -1
> > > > checkpoint file: 6
> > > >
> > > > GROMACS patchlevel, binary or parallel settings differ from previous
> > run.
> > > > Continuation is exact, but not guaranteed to be binary identical.
> > > >
> > > >
> > > > ---
> > > > Program gmx mdrun, VERSION 5.1.2
> > > > Source code file:
> > > > /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp, line: 2216
> > > >
> > > > Fatal error:
> > > > Truncation of file md_test.xtc failed. Cannot do appending because of
> > > this
> > > > failure.
> > > > For more information and tips for troubleshooting, please check the
> > > GROMACS
> > > > website at http://www.gromacs.org/Documentation/Errors
> > > > ---
> > > > 
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List 

Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-16 Thread Mark Abraham
Hi,

On Thu, Jun 16, 2016 at 9:30 AM Husen R  wrote:

> Hi,
>
> Thank you for your reply !
>
> md_test.xtc is exist and writable.
>

OK, but it needs to be seen that way from the set of compute nodes you are
using, and organizing that is up to you and your job scheduler, etc.


> I tried to restart from checkpoint file by excluding other node than
> compute-node and it works.
>

Go do that, then :-)


> only '--exclude=compute-node' that produces this error.
>

Then there's something about that node that is special with respect to the
file system - there's nothing about any particular node that GROMACS cares
about.

Mark


> is this has the same issue with this thread ?
> http://comments.gmane.org/gmane.science.biology.gromacs.user/40984
>
> regards,
>
> Husen
>
> On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham 
> wrote:
>
> > Hi,
> >
> > The stuff about different nodes or numbers of nodes doesn't matter - it's
> > merely an advisory note from mdrun. mdrun failed when it tried to operate
> > upon md_test.xtc, so perhaps you need to consider whether the file
> exists,
> > is writable, etc.
> >
> > Mark
> >
> > On Thu, Jun 16, 2016 at 6:48 AM Husen R  wrote:
> >
> > > Hi all,
> > >
> > > I got the following error message when I tried to restart gromacs
> > > simulation from checkpoint file.
> > > I restart the simulation using fewer nodes and processes, and also I
> > > exclude one node using '--exclude=' option (in slurm) for experimental
> > > purpose.
> > >
> > > I'm sure fewer nodes and processes are not the cause of this error as I
> > > already test that.
> > > I have checked that the cause of this error is '--exclude=' usage. I
> > > excluded 1 node named 'compute-node' when restart from checkpoint (at
> > first
> > > run, I use all node including 'compute-node').
> > >
> > >
> > > it seems that at first run, the submit job script was built at
> > > compute-node. So, at restart, build user mismatch appeared because
> > > compute-node was not found (excluded).
> > >
> > > Am I right ? is this behavior normal ?
> > > or is that a way to avoid this, so I can freely restart from checkpoint
> > > using any nodes without limitation.
> > >
> > > thank you in advance
> > >
> > > Regards,
> > >
> > >
> > > Husen
> > >
> > > ==restart script=
> > > #!/bin/bash
> > > #SBATCH -J ayo
> > > #SBATCH -o md%j.out
> > > #SBATCH -A necis
> > > #SBATCH -N 2
> > > #SBATCH -n 16
> > > #SBATCH --exclude=compute-node
> > > #SBATCH --time=144:00:00
> > > #SBATCH --mail-user=hus...@gmail.com
> > > #SBATCH --mail-type=begin
> > > #SBATCH --mail-type=end
> > >
> > > mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> > > =
> > >
> > >
> > >
> > >
> > > ==output error
> > > Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44 2016
> > >
> > >
> > >   Build time mismatch,
> > > current program: Sel Apr  5 13:37:32 WIB 2016
> > > checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> > >
> > >   Build user mismatch,
> > > current program: pro@head-node [CMAKE]
> > > checkpoint file: pro@compute-node [CMAKE]
> > >
> > >   #ranks mismatch,
> > > current program: 16
> > > checkpoint file: 24
> > >
> > >   #PME-ranks mismatch,
> > > current program: -1
> > > checkpoint file: 6
> > >
> > > GROMACS patchlevel, binary or parallel settings differ from previous
> run.
> > > Continuation is exact, but not guaranteed to be binary identical.
> > >
> > >
> > > ---
> > > Program gmx mdrun, VERSION 5.1.2
> > > Source code file:
> > > /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp, line: 2216
> > >
> > > Fatal error:
> > > Truncation of file md_test.xtc failed. Cannot do appending because of
> > this
> > > failure.
> > > For more information and tips for troubleshooting, please check the
> > GROMACS
> > > website at http://www.gromacs.org/Documentation/Errors
> > > ---
> > > 
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-requ...@gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > 

Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-16 Thread Husen R
Hi,

Thank you for your reply !

md_test.xtc is exist and writable.
I tried to restart from checkpoint file by excluding other node than
compute-node and it works.
only '--exclude=compute-node' that produces this error.

is this has the same issue with this thread ?
http://comments.gmane.org/gmane.science.biology.gromacs.user/40984

regards,

Husen

On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham 
wrote:

> Hi,
>
> The stuff about different nodes or numbers of nodes doesn't matter - it's
> merely an advisory note from mdrun. mdrun failed when it tried to operate
> upon md_test.xtc, so perhaps you need to consider whether the file exists,
> is writable, etc.
>
> Mark
>
> On Thu, Jun 16, 2016 at 6:48 AM Husen R  wrote:
>
> > Hi all,
> >
> > I got the following error message when I tried to restart gromacs
> > simulation from checkpoint file.
> > I restart the simulation using fewer nodes and processes, and also I
> > exclude one node using '--exclude=' option (in slurm) for experimental
> > purpose.
> >
> > I'm sure fewer nodes and processes are not the cause of this error as I
> > already test that.
> > I have checked that the cause of this error is '--exclude=' usage. I
> > excluded 1 node named 'compute-node' when restart from checkpoint (at
> first
> > run, I use all node including 'compute-node').
> >
> >
> > it seems that at first run, the submit job script was built at
> > compute-node. So, at restart, build user mismatch appeared because
> > compute-node was not found (excluded).
> >
> > Am I right ? is this behavior normal ?
> > or is that a way to avoid this, so I can freely restart from checkpoint
> > using any nodes without limitation.
> >
> > thank you in advance
> >
> > Regards,
> >
> >
> > Husen
> >
> > ==restart script=
> > #!/bin/bash
> > #SBATCH -J ayo
> > #SBATCH -o md%j.out
> > #SBATCH -A necis
> > #SBATCH -N 2
> > #SBATCH -n 16
> > #SBATCH --exclude=compute-node
> > #SBATCH --time=144:00:00
> > #SBATCH --mail-user=hus...@gmail.com
> > #SBATCH --mail-type=begin
> > #SBATCH --mail-type=end
> >
> > mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> > =
> >
> >
> >
> >
> > ==output error
> > Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44 2016
> >
> >
> >   Build time mismatch,
> > current program: Sel Apr  5 13:37:32 WIB 2016
> > checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> >
> >   Build user mismatch,
> > current program: pro@head-node [CMAKE]
> > checkpoint file: pro@compute-node [CMAKE]
> >
> >   #ranks mismatch,
> > current program: 16
> > checkpoint file: 24
> >
> >   #PME-ranks mismatch,
> > current program: -1
> > checkpoint file: 6
> >
> > GROMACS patchlevel, binary or parallel settings differ from previous run.
> > Continuation is exact, but not guaranteed to be binary identical.
> >
> >
> > ---
> > Program gmx mdrun, VERSION 5.1.2
> > Source code file:
> > /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp, line: 2216
> >
> > Fatal error:
> > Truncation of file md_test.xtc failed. Cannot do appending because of
> this
> > failure.
> > For more information and tips for troubleshooting, please check the
> GROMACS
> > website at http://www.gromacs.org/Documentation/Errors
> > ---
> > 
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-16 Thread Mark Abraham
Hi,

The stuff about different nodes or numbers of nodes doesn't matter - it's
merely an advisory note from mdrun. mdrun failed when it tried to operate
upon md_test.xtc, so perhaps you need to consider whether the file exists,
is writable, etc.

Mark

On Thu, Jun 16, 2016 at 6:48 AM Husen R  wrote:

> Hi all,
>
> I got the following error message when I tried to restart gromacs
> simulation from checkpoint file.
> I restart the simulation using fewer nodes and processes, and also I
> exclude one node using '--exclude=' option (in slurm) for experimental
> purpose.
>
> I'm sure fewer nodes and processes are not the cause of this error as I
> already test that.
> I have checked that the cause of this error is '--exclude=' usage. I
> excluded 1 node named 'compute-node' when restart from checkpoint (at first
> run, I use all node including 'compute-node').
>
>
> it seems that at first run, the submit job script was built at
> compute-node. So, at restart, build user mismatch appeared because
> compute-node was not found (excluded).
>
> Am I right ? is this behavior normal ?
> or is that a way to avoid this, so I can freely restart from checkpoint
> using any nodes without limitation.
>
> thank you in advance
>
> Regards,
>
>
> Husen
>
> ==restart script=
> #!/bin/bash
> #SBATCH -J ayo
> #SBATCH -o md%j.out
> #SBATCH -A necis
> #SBATCH -N 2
> #SBATCH -n 16
> #SBATCH --exclude=compute-node
> #SBATCH --time=144:00:00
> #SBATCH --mail-user=hus...@gmail.com
> #SBATCH --mail-type=begin
> #SBATCH --mail-type=end
>
> mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> =
>
>
>
>
> ==output error
> Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44 2016
>
>
>   Build time mismatch,
> current program: Sel Apr  5 13:37:32 WIB 2016
> checkpoint file: Rab Apr  6 09:44:51 WIB 2016
>
>   Build user mismatch,
> current program: pro@head-node [CMAKE]
> checkpoint file: pro@compute-node [CMAKE]
>
>   #ranks mismatch,
> current program: 16
> checkpoint file: 24
>
>   #PME-ranks mismatch,
> current program: -1
> checkpoint file: 6
>
> GROMACS patchlevel, binary or parallel settings differ from previous run.
> Continuation is exact, but not guaranteed to be binary identical.
>
>
> ---
> Program gmx mdrun, VERSION 5.1.2
> Source code file:
> /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp, line: 2216
>
> Fatal error:
> Truncation of file md_test.xtc failed. Cannot do appending because of this
> failure.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> ---
> 
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-15 Thread Husen R
this is the rest of the error message..
regards,

Husen




Halting parallel program gmx mdrun on rank 0 out of 16
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Fatal error in PMPI_Bcast: Unknown error class, error stack:
PMPI_Bcast(1635)..: MPI_Bcast(buf=0xcd9ed8, count=4,
MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477).:
MPIR_Bcast(1501)..:
MPIR_Bcast_intra(1272):
MPIR_SMP_Bcast(1104)..:
MPIR_Bcast_binomial(256)..:
MPIDU_Complete_posted_with_error(1189): Process failed
MPIR_SMP_Bcast()..:
MPIR_Bcast_binomial(327)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1635): MPI_Bcast(buf=0x1858e78, count=4, MPI_BYTE,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477)...:
MPIR_Bcast(1501):
MPIR_Bcast_intra(1272)..:
MPIR_SMP_Bcast():
MPIR_Bcast_binomial(327): Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1635): MPI_Bcast(buf=0x24f7e78, count=4, MPI_BYTE,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477)...:
MPIR_Bcast(1501):
MPIR_Bcast_intra(1272)..:
MPIR_SMP_Bcast():
MPIR_Bcast_binomial(327): Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1635): MPI_Bcast(buf=0xb21e78, count=4, MPI_BYTE,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477)...:
MPIR_Bcast(1501):
MPIR_Bcast_intra(1272)..:
MPIR_SMP_Bcast():
MPIR_Bcast_binomial(327): Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1635): MPI_Bcast(buf=0x15fbe78, count=4, MPI_BYTE,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477)...:
MPIR_Bcast(1501):
MPIR_Bcast_intra(1272)..:
MPIR_SMP_Bcast():
MPIR_Bcast_binomial(327): Failure during collective

===
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 6983 RUNNING AT head-node
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===

On Thu, Jun 16, 2016 at 11:48 AM, Husen R  wrote:

> Hi all,
>
> I got the following error message when I tried to restart gromacs
> simulation from checkpoint file.
> I restart the simulation using fewer nodes and processes, and also I
> exclude one node using '--exclude=' option (in slurm) for experimental
> purpose.
>
> I'm sure fewer nodes and processes are not the cause of this error as I
> already test that.
> I have checked that the cause of this error is '--exclude=' usage. I
> excluded 1 node named 'compute-node' when restart from checkpoint (at first
> run, I use all node including 'compute-node').
>
>
> it seems that at first run, the submit job script was built at
> compute-node. So, at restart, build user mismatch appeared because
> compute-node was not found (excluded).
>
> Am I right ? is this behavior normal ?
> or is that a way to avoid this, so I can freely restart from checkpoint
> using any nodes without limitation.
>
> thank you in advance
>
> Regards,
>
>
> Husen
>
> ==restart script=
> #!/bin/bash
> #SBATCH -J ayo
> #SBATCH -o md%j.out
> #SBATCH -A necis
> #SBATCH -N 2
> #SBATCH -n 16
> #SBATCH --exclude=compute-node
> #SBATCH --time=144:00:00
> #SBATCH --mail-user=hus...@gmail.com
> #SBATCH --mail-type=begin
> #SBATCH --mail-type=end
>
> mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> =
>
>
>
>
> ==output error
> Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44 2016
>
>
>   Build time mismatch,
> current program: Sel Apr  5 13:37:32 WIB 2016
> checkpoint file: Rab Apr  6 09:44:51 WIB 2016
>
>   Build user mismatch,
> current program: pro@head-node [CMAKE]
> checkpoint file: pro@compute-node [CMAKE]
>
>   #ranks mismatch,
> current program: 16
> checkpoint file: 24
>
>   #PME-ranks mismatch,
> current program: -1
> checkpoint file: 6
>
> GROMACS patchlevel, binary or parallel settings differ from previous run.
> Continuation is exact, but not guaranteed to be binary identical.
>
>
> ---
> Program gmx mdrun, VERSION 5.1.2
> Source code file:
> /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp, line: 2216
>
> Fatal error:
> Truncation of file md_test.xtc failed. Cannot do appending because of this
> failure.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> ---
> 

[gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

2016-06-15 Thread Husen R
Hi all,

I got the following error message when I tried to restart gromacs
simulation from checkpoint file.
I restart the simulation using fewer nodes and processes, and also I
exclude one node using '--exclude=' option (in slurm) for experimental
purpose.

I'm sure fewer nodes and processes are not the cause of this error as I
already test that.
I have checked that the cause of this error is '--exclude=' usage. I
excluded 1 node named 'compute-node' when restart from checkpoint (at first
run, I use all node including 'compute-node').


it seems that at first run, the submit job script was built at
compute-node. So, at restart, build user mismatch appeared because
compute-node was not found (excluded).

Am I right ? is this behavior normal ?
or is that a way to avoid this, so I can freely restart from checkpoint
using any nodes without limitation.

thank you in advance

Regards,


Husen

==restart script=
#!/bin/bash
#SBATCH -J ayo
#SBATCH -o md%j.out
#SBATCH -A necis
#SBATCH -N 2
#SBATCH -n 16
#SBATCH --exclude=compute-node
#SBATCH --time=144:00:00
#SBATCH --mail-user=hus...@gmail.com
#SBATCH --mail-type=begin
#SBATCH --mail-type=end

mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
=




==output error
Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44 2016


  Build time mismatch,
current program: Sel Apr  5 13:37:32 WIB 2016
checkpoint file: Rab Apr  6 09:44:51 WIB 2016

  Build user mismatch,
current program: pro@head-node [CMAKE]
checkpoint file: pro@compute-node [CMAKE]

  #ranks mismatch,
current program: 16
checkpoint file: 24

  #PME-ranks mismatch,
current program: -1
checkpoint file: 6

GROMACS patchlevel, binary or parallel settings differ from previous run.
Continuation is exact, but not guaranteed to be binary identical.


---
Program gmx mdrun, VERSION 5.1.2
Source code file:
/home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp, line: 2216

Fatal error:
Truncation of file md_test.xtc failed. Cannot do appending because of this
failure.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
---

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.