Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-16 Thread Husen R
Hi all,

After spending time for troubleshooting, I found that gromacs
checkpoint/restart feature is working well.

The failure occurred because I use root user to submit restart job (using
slurm resource manager). After switching to non root user, the restart
process is running.
The reason why I use root user is because I run this job in bash scripting
and execute it at designated time using Cron.
I know this is not the right place to talk about slurm.

Thank you for your reply !

Regards,

Husen

On Sun, May 15, 2016 at 8:20 PM,  wrote:

> ok thanks
>
> > Hi,
> >
> > Yes, that's one way to work around the problem. In some places, a module
> > subsystem can be used to take care of the selection automatically, but
> you
> > don't want to set one up for just you to use.
> >
> > Mark
> >
> > On Sun, May 15, 2016 at 11:48 AM  wrote:
> >
> >> Thanks Mark,
> >>
> >> My sysadmins have let me install my own GROMACS versions and have not
> >> informed me of any such mechanism. Would you suggest I qrsh into a node
> >> of
> >> each type and build an mdrun-only version on each? I'd then select a
> >> particular node type for a submit script with the relevant mdrun.
> >>
> >> Many thanks
> >> James
> >>
> >> > Hi,
> >> >
> >> > On Sat, May 14, 2016 at 1:09 PM  wrote:
> >> >
> >> >> In case it's relevant/interesting to anyone, here are the details on
> >> our
> >> >> cluster nodes:
> >> >>
> >> >> nodes   #   model   # cores cpu
> >> >> model
> >> >>   RAM   node_type
> >> >> fmb01 - fmb33   33  IBM HS21XM  8   3 GHz
> >> >> Xeon
> >> >> E5450
> >> >>  16GB   hs21
> >> >> fmb34 - fmb42   9   IBM HS228   2.4
> >> GHz
> >> >> Xeon E5530
> >> >> 16GBhs22
> >> >> fmb43 - fmb88   45  Dell PE M6108   2.4
> >> GHz
> >> >> Xeon E5530
> >> >>  16GB   m610
> >> >> fmb88 - fmb90   3   Dell PE M610+   12  3.4
> >> GHz
> >> >> Xeon X5690
> >> >>   48GB  m610+
> >> >> fmb91 - fmb202  112 Dell PE M62024 (HT) 2.9
> >> GHz
> >> >> Xeon E5-2667
> >> >>64GB m620
> >> >> fmb203 - fmb279 77  Dell PE M62024 (HT) 3.5
> >> GHz
> >> >> Xeon E5-2643 v2 64GB
> >> >> m620+
> >> >> fmb280 - fmb359 80  Dell PE M63024 (HT) 3.4
> >> GHz
> >> >> Xeon E5-2643 v3 64GB
> >> >> m630
> >> >>
> >> >> I could only run GROMACS 4.6.2 on the last three node types and I
> >> >> believe
> >> >> the same is true for 5.0.4
> >> >>
> >> >
> >> > Sure. GROMACS is designed to target whichever hardware was selected at
> >> > configure time, which your sysadmins for such a heterogeneous cluster
> >> > should have documented somewhere. They should also be making available
> >> to
> >> > you a mechanism to target your jobs to nodes where they can run
> >> programs
> >> > that use the hardware efficiently, or providing GROMACS installations
> >> that
> >> > work regardless of which node you are actually on. You might like to
> >> > respectfully remind them of the things we say at
> >> >
> >>
> http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects
> >> > (These thoughts are common to earlier versions also.)
> >> >
> >> > Mark
> >> >
> >> >
> >> > Best wishes
> >> >> James
> >> >>
> >> >> > I have found that only some kinds of nodes on our cluster work for
> >> >> gromacs
> >> >> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can
> >> >> check
> >> >> > the details tomorrow). I haven't tested it again now I'm using 5.0
> >> so
> >> >> > don't know if that's still an issue but if it is it could explain
> >> why
> >> >> your
> >> >> > restart failed even and the initial run didn't.
> >> >> >
> >> >> >> thanks a lot for your fast response.
> >> >> >>
> >> >> >> I have tried it, and it failed. I ask in this forum just to make
> >> >> sure.
> >> >> >> However, there was something in my cluster that probably make it
> >> >> failed.
> >> >> >> I'll handle it first and then retry to restart again.
> >> >> >>
> >> >> >> Regards,
> >> >> >>
> >> >> >> Husen
> >> >> >>
> >> >> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul 
> >> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> On 5/13/16 8:53 PM, Husen R wrote:
> >> >> >>>
> >> >>  Dear all
> >> >> 
> >> >>  Does simulation able to be restarted from checkpoint file with
> >> >> fewer
> >> >>  nodes ?
> >> >>  let's say, at the first time, I run simulation with 3 nodes. At
> >> >>  running
> >> >>  time, one of those nodes is crashed and the simulation is
> >> >> terminated.
> >> >> 
> >> >>  I want to restart that simulation immadiately based on
> >> checkpoint
> >> >> file
> >> >>  with
> >> >>  the remaining 2 nodes. does gromacs support such case 

Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-15 Thread jkrieger
ok thanks

> Hi,
>
> Yes, that's one way to work around the problem. In some places, a module
> subsystem can be used to take care of the selection automatically, but you
> don't want to set one up for just you to use.
>
> Mark
>
> On Sun, May 15, 2016 at 11:48 AM  wrote:
>
>> Thanks Mark,
>>
>> My sysadmins have let me install my own GROMACS versions and have not
>> informed me of any such mechanism. Would you suggest I qrsh into a node
>> of
>> each type and build an mdrun-only version on each? I'd then select a
>> particular node type for a submit script with the relevant mdrun.
>>
>> Many thanks
>> James
>>
>> > Hi,
>> >
>> > On Sat, May 14, 2016 at 1:09 PM  wrote:
>> >
>> >> In case it's relevant/interesting to anyone, here are the details on
>> our
>> >> cluster nodes:
>> >>
>> >> nodes   #   model   # cores cpu
>> >> model
>> >>   RAM   node_type
>> >> fmb01 - fmb33   33  IBM HS21XM  8   3 GHz
>> >> Xeon
>> >> E5450
>> >>  16GB   hs21
>> >> fmb34 - fmb42   9   IBM HS228   2.4
>> GHz
>> >> Xeon E5530
>> >> 16GBhs22
>> >> fmb43 - fmb88   45  Dell PE M6108   2.4
>> GHz
>> >> Xeon E5530
>> >>  16GB   m610
>> >> fmb88 - fmb90   3   Dell PE M610+   12  3.4
>> GHz
>> >> Xeon X5690
>> >>   48GB  m610+
>> >> fmb91 - fmb202  112 Dell PE M62024 (HT) 2.9
>> GHz
>> >> Xeon E5-2667
>> >>64GB m620
>> >> fmb203 - fmb279 77  Dell PE M62024 (HT) 3.5
>> GHz
>> >> Xeon E5-2643 v2 64GB
>> >> m620+
>> >> fmb280 - fmb359 80  Dell PE M63024 (HT) 3.4
>> GHz
>> >> Xeon E5-2643 v3 64GB
>> >> m630
>> >>
>> >> I could only run GROMACS 4.6.2 on the last three node types and I
>> >> believe
>> >> the same is true for 5.0.4
>> >>
>> >
>> > Sure. GROMACS is designed to target whichever hardware was selected at
>> > configure time, which your sysadmins for such a heterogeneous cluster
>> > should have documented somewhere. They should also be making available
>> to
>> > you a mechanism to target your jobs to nodes where they can run
>> programs
>> > that use the hardware efficiently, or providing GROMACS installations
>> that
>> > work regardless of which node you are actually on. You might like to
>> > respectfully remind them of the things we say at
>> >
>> http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects
>> > (These thoughts are common to earlier versions also.)
>> >
>> > Mark
>> >
>> >
>> > Best wishes
>> >> James
>> >>
>> >> > I have found that only some kinds of nodes on our cluster work for
>> >> gromacs
>> >> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can
>> >> check
>> >> > the details tomorrow). I haven't tested it again now I'm using 5.0
>> so
>> >> > don't know if that's still an issue but if it is it could explain
>> why
>> >> your
>> >> > restart failed even and the initial run didn't.
>> >> >
>> >> >> thanks a lot for your fast response.
>> >> >>
>> >> >> I have tried it, and it failed. I ask in this forum just to make
>> >> sure.
>> >> >> However, there was something in my cluster that probably make it
>> >> failed.
>> >> >> I'll handle it first and then retry to restart again.
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >> Husen
>> >> >>
>> >> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul 
>> >> wrote:
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> On 5/13/16 8:53 PM, Husen R wrote:
>> >> >>>
>> >>  Dear all
>> >> 
>> >>  Does simulation able to be restarted from checkpoint file with
>> >> fewer
>> >>  nodes ?
>> >>  let's say, at the first time, I run simulation with 3 nodes. At
>> >>  running
>> >>  time, one of those nodes is crashed and the simulation is
>> >> terminated.
>> >> 
>> >>  I want to restart that simulation immadiately based on
>> checkpoint
>> >> file
>> >>  with
>> >>  the remaining 2 nodes. does gromacs support such case ?
>> >>  I need help.
>> >> 
>> >> >>>
>> >> >>> Have you tried it?  It should work.  You will probably get a note
>> >> about
>> >> >>> the continuation not being exact due to a change in the number of
>> >> >>> cores,
>> >> >>> but the run should proceed fine.
>> >> >>>
>> >> >>> -Justin
>> >> >>>
>> >> >>> --
>> >> >>> ==
>> >> >>>
>> >> >>> Justin A. Lemkul, Ph.D.
>> >> >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>> >> >>>
>> >> >>> Department of Pharmaceutical Sciences
>> >> >>> School of Pharmacy
>> >> >>> Health Sciences Facility II, Room 629
>> >> >>> University of Maryland, Baltimore
>> >> >>> 20 Penn St.
>> >> >>> Baltimore, MD 21201
>> >> >>>
>> >> >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441
>> >> >>> http://mackerell.umaryland.edu/~jalemkul
>> >> 

Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-15 Thread Mark Abraham
Hi,

Yes, that's one way to work around the problem. In some places, a module
subsystem can be used to take care of the selection automatically, but you
don't want to set one up for just you to use.

Mark

On Sun, May 15, 2016 at 11:48 AM  wrote:

> Thanks Mark,
>
> My sysadmins have let me install my own GROMACS versions and have not
> informed me of any such mechanism. Would you suggest I qrsh into a node of
> each type and build an mdrun-only version on each? I'd then select a
> particular node type for a submit script with the relevant mdrun.
>
> Many thanks
> James
>
> > Hi,
> >
> > On Sat, May 14, 2016 at 1:09 PM  wrote:
> >
> >> In case it's relevant/interesting to anyone, here are the details on our
> >> cluster nodes:
> >>
> >> nodes   #   model   # cores cpu
> >> model
> >>   RAM   node_type
> >> fmb01 - fmb33   33  IBM HS21XM  8   3 GHz
> >> Xeon
> >> E5450
> >>  16GB   hs21
> >> fmb34 - fmb42   9   IBM HS228   2.4 GHz
> >> Xeon E5530
> >> 16GBhs22
> >> fmb43 - fmb88   45  Dell PE M6108   2.4 GHz
> >> Xeon E5530
> >>  16GB   m610
> >> fmb88 - fmb90   3   Dell PE M610+   12  3.4 GHz
> >> Xeon X5690
> >>   48GB  m610+
> >> fmb91 - fmb202  112 Dell PE M62024 (HT) 2.9 GHz
> >> Xeon E5-2667
> >>64GB m620
> >> fmb203 - fmb279 77  Dell PE M62024 (HT) 3.5 GHz
> >> Xeon E5-2643 v2 64GB
> >> m620+
> >> fmb280 - fmb359 80  Dell PE M63024 (HT) 3.4 GHz
> >> Xeon E5-2643 v3 64GB
> >> m630
> >>
> >> I could only run GROMACS 4.6.2 on the last three node types and I
> >> believe
> >> the same is true for 5.0.4
> >>
> >
> > Sure. GROMACS is designed to target whichever hardware was selected at
> > configure time, which your sysadmins for such a heterogeneous cluster
> > should have documented somewhere. They should also be making available to
> > you a mechanism to target your jobs to nodes where they can run programs
> > that use the hardware efficiently, or providing GROMACS installations
> that
> > work regardless of which node you are actually on. You might like to
> > respectfully remind them of the things we say at
> >
> http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects
> > (These thoughts are common to earlier versions also.)
> >
> > Mark
> >
> >
> > Best wishes
> >> James
> >>
> >> > I have found that only some kinds of nodes on our cluster work for
> >> gromacs
> >> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can
> >> check
> >> > the details tomorrow). I haven't tested it again now I'm using 5.0 so
> >> > don't know if that's still an issue but if it is it could explain why
> >> your
> >> > restart failed even and the initial run didn't.
> >> >
> >> >> thanks a lot for your fast response.
> >> >>
> >> >> I have tried it, and it failed. I ask in this forum just to make
> >> sure.
> >> >> However, there was something in my cluster that probably make it
> >> failed.
> >> >> I'll handle it first and then retry to restart again.
> >> >>
> >> >> Regards,
> >> >>
> >> >> Husen
> >> >>
> >> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul 
> >> wrote:
> >> >>
> >> >>>
> >> >>>
> >> >>> On 5/13/16 8:53 PM, Husen R wrote:
> >> >>>
> >>  Dear all
> >> 
> >>  Does simulation able to be restarted from checkpoint file with
> >> fewer
> >>  nodes ?
> >>  let's say, at the first time, I run simulation with 3 nodes. At
> >>  running
> >>  time, one of those nodes is crashed and the simulation is
> >> terminated.
> >> 
> >>  I want to restart that simulation immadiately based on checkpoint
> >> file
> >>  with
> >>  the remaining 2 nodes. does gromacs support such case ?
> >>  I need help.
> >> 
> >> >>>
> >> >>> Have you tried it?  It should work.  You will probably get a note
> >> about
> >> >>> the continuation not being exact due to a change in the number of
> >> >>> cores,
> >> >>> but the run should proceed fine.
> >> >>>
> >> >>> -Justin
> >> >>>
> >> >>> --
> >> >>> ==
> >> >>>
> >> >>> Justin A. Lemkul, Ph.D.
> >> >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
> >> >>>
> >> >>> Department of Pharmaceutical Sciences
> >> >>> School of Pharmacy
> >> >>> Health Sciences Facility II, Room 629
> >> >>> University of Maryland, Baltimore
> >> >>> 20 Penn St.
> >> >>> Baltimore, MD 21201
> >> >>>
> >> >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441
> >> >>> http://mackerell.umaryland.edu/~jalemkul
> >> >>>
> >> >>> ==
> >> >>> --
> >> >>> Gromacs Users mailing list
> >> >>>
> >> >>> * Please search the archive at
> >> >>> 

Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-15 Thread jkrieger
Thanks Mark,

My sysadmins have let me install my own GROMACS versions and have not
informed me of any such mechanism. Would you suggest I qrsh into a node of
each type and build an mdrun-only version on each? I'd then select a
particular node type for a submit script with the relevant mdrun.

Many thanks
James

> Hi,
>
> On Sat, May 14, 2016 at 1:09 PM  wrote:
>
>> In case it's relevant/interesting to anyone, here are the details on our
>> cluster nodes:
>>
>> nodes   #   model   # cores cpu
>> model
>>   RAM   node_type
>> fmb01 - fmb33   33  IBM HS21XM  8   3 GHz
>> Xeon
>> E5450
>>  16GB   hs21
>> fmb34 - fmb42   9   IBM HS228   2.4 GHz
>> Xeon E5530
>> 16GBhs22
>> fmb43 - fmb88   45  Dell PE M6108   2.4 GHz
>> Xeon E5530
>>  16GB   m610
>> fmb88 - fmb90   3   Dell PE M610+   12  3.4 GHz
>> Xeon X5690
>>   48GB  m610+
>> fmb91 - fmb202  112 Dell PE M62024 (HT) 2.9 GHz
>> Xeon E5-2667
>>64GB m620
>> fmb203 - fmb279 77  Dell PE M62024 (HT) 3.5 GHz
>> Xeon E5-2643 v2 64GB
>> m620+
>> fmb280 - fmb359 80  Dell PE M63024 (HT) 3.4 GHz
>> Xeon E5-2643 v3 64GB
>> m630
>>
>> I could only run GROMACS 4.6.2 on the last three node types and I
>> believe
>> the same is true for 5.0.4
>>
>
> Sure. GROMACS is designed to target whichever hardware was selected at
> configure time, which your sysadmins for such a heterogeneous cluster
> should have documented somewhere. They should also be making available to
> you a mechanism to target your jobs to nodes where they can run programs
> that use the hardware efficiently, or providing GROMACS installations that
> work regardless of which node you are actually on. You might like to
> respectfully remind them of the things we say at
> http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects
> (These thoughts are common to earlier versions also.)
>
> Mark
>
>
> Best wishes
>> James
>>
>> > I have found that only some kinds of nodes on our cluster work for
>> gromacs
>> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can
>> check
>> > the details tomorrow). I haven't tested it again now I'm using 5.0 so
>> > don't know if that's still an issue but if it is it could explain why
>> your
>> > restart failed even and the initial run didn't.
>> >
>> >> thanks a lot for your fast response.
>> >>
>> >> I have tried it, and it failed. I ask in this forum just to make
>> sure.
>> >> However, there was something in my cluster that probably make it
>> failed.
>> >> I'll handle it first and then retry to restart again.
>> >>
>> >> Regards,
>> >>
>> >> Husen
>> >>
>> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul 
>> wrote:
>> >>
>> >>>
>> >>>
>> >>> On 5/13/16 8:53 PM, Husen R wrote:
>> >>>
>>  Dear all
>> 
>>  Does simulation able to be restarted from checkpoint file with
>> fewer
>>  nodes ?
>>  let's say, at the first time, I run simulation with 3 nodes. At
>>  running
>>  time, one of those nodes is crashed and the simulation is
>> terminated.
>> 
>>  I want to restart that simulation immadiately based on checkpoint
>> file
>>  with
>>  the remaining 2 nodes. does gromacs support such case ?
>>  I need help.
>> 
>> >>>
>> >>> Have you tried it?  It should work.  You will probably get a note
>> about
>> >>> the continuation not being exact due to a change in the number of
>> >>> cores,
>> >>> but the run should proceed fine.
>> >>>
>> >>> -Justin
>> >>>
>> >>> --
>> >>> ==
>> >>>
>> >>> Justin A. Lemkul, Ph.D.
>> >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>> >>>
>> >>> Department of Pharmaceutical Sciences
>> >>> School of Pharmacy
>> >>> Health Sciences Facility II, Room 629
>> >>> University of Maryland, Baltimore
>> >>> 20 Penn St.
>> >>> Baltimore, MD 21201
>> >>>
>> >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441
>> >>> http://mackerell.umaryland.edu/~jalemkul
>> >>>
>> >>> ==
>> >>> --
>> >>> Gromacs Users mailing list
>> >>>
>> >>> * Please search the archive at
>> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> >>> posting!
>> >>>
>> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>>
>> >>> * For (un)subscribe requests visit
>> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> or
>> >>> send a mail to gmx-users-requ...@gromacs.org.
>> >>>
>> >> --
>> >> Gromacs Users mailing list
>> >>
>> >> * Please search the archive at
>> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> >> posting!
>> >>
>> >> * Can't post? Read 

Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-14 Thread Mark Abraham
Hi,

On Sat, May 14, 2016 at 1:09 PM  wrote:

> In case it's relevant/interesting to anyone, here are the details on our
> cluster nodes:
>
> nodes   #   model   # cores cpu model
>   RAM   node_type
> fmb01 - fmb33   33  IBM HS21XM  8   3 GHz Xeon
> E5450
>  16GB   hs21
> fmb34 - fmb42   9   IBM HS228   2.4 GHz
> Xeon E5530
> 16GBhs22
> fmb43 - fmb88   45  Dell PE M6108   2.4 GHz
> Xeon E5530
>  16GB   m610
> fmb88 - fmb90   3   Dell PE M610+   12  3.4 GHz
> Xeon X5690
>   48GB  m610+
> fmb91 - fmb202  112 Dell PE M62024 (HT) 2.9 GHz
> Xeon E5-2667
>64GB m620
> fmb203 - fmb279 77  Dell PE M62024 (HT) 3.5 GHz
> Xeon E5-2643 v2 64GB
> m620+
> fmb280 - fmb359 80  Dell PE M63024 (HT) 3.4 GHz
> Xeon E5-2643 v3 64GB
> m630
>
> I could only run GROMACS 4.6.2 on the last three node types and I believe
> the same is true for 5.0.4
>

Sure. GROMACS is designed to target whichever hardware was selected at
configure time, which your sysadmins for such a heterogeneous cluster
should have documented somewhere. They should also be making available to
you a mechanism to target your jobs to nodes where they can run programs
that use the hardware efficiently, or providing GROMACS installations that
work regardless of which node you are actually on. You might like to
respectfully remind them of the things we say at
http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects
(These thoughts are common to earlier versions also.)

Mark


Best wishes
> James
>
> > I have found that only some kinds of nodes on our cluster work for
> gromacs
> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can check
> > the details tomorrow). I haven't tested it again now I'm using 5.0 so
> > don't know if that's still an issue but if it is it could explain why
> your
> > restart failed even and the initial run didn't.
> >
> >> thanks a lot for your fast response.
> >>
> >> I have tried it, and it failed. I ask in this forum just to make sure.
> >> However, there was something in my cluster that probably make it failed.
> >> I'll handle it first and then retry to restart again.
> >>
> >> Regards,
> >>
> >> Husen
> >>
> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul  wrote:
> >>
> >>>
> >>>
> >>> On 5/13/16 8:53 PM, Husen R wrote:
> >>>
>  Dear all
> 
>  Does simulation able to be restarted from checkpoint file with fewer
>  nodes ?
>  let's say, at the first time, I run simulation with 3 nodes. At
>  running
>  time, one of those nodes is crashed and the simulation is terminated.
> 
>  I want to restart that simulation immadiately based on checkpoint file
>  with
>  the remaining 2 nodes. does gromacs support such case ?
>  I need help.
> 
> >>>
> >>> Have you tried it?  It should work.  You will probably get a note about
> >>> the continuation not being exact due to a change in the number of
> >>> cores,
> >>> but the run should proceed fine.
> >>>
> >>> -Justin
> >>>
> >>> --
> >>> ==
> >>>
> >>> Justin A. Lemkul, Ph.D.
> >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
> >>>
> >>> Department of Pharmaceutical Sciences
> >>> School of Pharmacy
> >>> Health Sciences Facility II, Room 629
> >>> University of Maryland, Baltimore
> >>> 20 Penn St.
> >>> Baltimore, MD 21201
> >>>
> >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441
> >>> http://mackerell.umaryland.edu/~jalemkul
> >>>
> >>> ==
> >>> --
> >>> Gromacs Users mailing list
> >>>
> >>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>> posting!
> >>>
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>
> >>> * For (un)subscribe requests visit
> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>> send a mail to gmx-users-requ...@gromacs.org.
> >>>
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send
> >> a mail to gmx-users-requ...@gromacs.org.
> >>
> >
> >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > 

Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-14 Thread jkrieger
In case it's relevant/interesting to anyone, here are the details on our
cluster nodes:

nodes   #   model   # cores cpu model   
 
  RAM   node_type
fmb01 - fmb33   33  IBM HS21XM  8   3 GHz Xeon 
E5450  
 16GB   hs21
fmb34 - fmb42   9   IBM HS228   2.4 GHz Xeon 
E5530 
16GBhs22
fmb43 - fmb88   45  Dell PE M6108   2.4 GHz Xeon 
E5530
 16GB   m610
fmb88 - fmb90   3   Dell PE M610+   12  3.4 GHz Xeon 
X5690   
  48GB  m610+
fmb91 - fmb202  112 Dell PE M62024 (HT) 2.9 GHz Xeon 
E5-2667
   64GB m620
fmb203 - fmb279 77  Dell PE M62024 (HT) 3.5 GHz Xeon 
E5-2643 v2 64GB
m620+
fmb280 - fmb359 80  Dell PE M63024 (HT) 3.4 GHz Xeon 
E5-2643 v3 64GB
m630

I could only run GROMACS 4.6.2 on the last three node types and I believe
the same is true for 5.0.4

Best wishes
James

> I have found that only some kinds of nodes on our cluster work for gromacs
> 4.6 (the ones we call m620, m620+ and m630 but not others - I can check
> the details tomorrow). I haven't tested it again now I'm using 5.0 so
> don't know if that's still an issue but if it is it could explain why your
> restart failed even and the initial run didn't.
>
>> thanks a lot for your fast response.
>>
>> I have tried it, and it failed. I ask in this forum just to make sure.
>> However, there was something in my cluster that probably make it failed.
>> I'll handle it first and then retry to restart again.
>>
>> Regards,
>>
>> Husen
>>
>> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul  wrote:
>>
>>>
>>>
>>> On 5/13/16 8:53 PM, Husen R wrote:
>>>
 Dear all

 Does simulation able to be restarted from checkpoint file with fewer
 nodes ?
 let's say, at the first time, I run simulation with 3 nodes. At
 running
 time, one of those nodes is crashed and the simulation is terminated.

 I want to restart that simulation immadiately based on checkpoint file
 with
 the remaining 2 nodes. does gromacs support such case ?
 I need help.

>>>
>>> Have you tried it?  It should work.  You will probably get a note about
>>> the continuation not being exact due to a change in the number of
>>> cores,
>>> but the run should proceed fine.
>>>
>>> -Justin
>>>
>>> --
>>> ==
>>>
>>> Justin A. Lemkul, Ph.D.
>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>
>>> Department of Pharmaceutical Sciences
>>> School of Pharmacy
>>> Health Sciences Facility II, Room 629
>>> University of Maryland, Baltimore
>>> 20 Penn St.
>>> Baltimore, MD 21201
>>>
>>> jalem...@outerbanks.umaryland.edu | (410) 706-7441
>>> http://mackerell.umaryland.edu/~jalemkul
>>>
>>> ==
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-requ...@gromacs.org.
>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send
>> a mail to gmx-users-requ...@gromacs.org.
>>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send
> a mail to gmx-users-requ...@gromacs.org.
>


-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-14 Thread Justin Lemkul



On 5/14/16 3:48 AM, Husen R wrote:

Hi,

Currently I'm running this tutorial (
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/08_MD.html)
to simulate restart with fewer nodes.
at restart, I changed the amount of nodes from 3 to 2 nodes.
I also changed the amount of processes from 24 to 16 processes.

While the application is running, I tried to see the output file.
This is the content of the output file :

#output file


Reading checkpoint file md_0_1.cpt generated: Sat May 14 13:10:25 2016

  #ranks mismatch,
current program: 16
checkpoint file: 24

  #PME-ranks mismatch,
current program: -1
checkpoint file: 6

GROMACS patchlevel, binary or parallel settings differ from previous run.
Continuation is exact, but not guaranteed to be binary identical.

Using 16 MPI processes
Using 1 OpenMP thread per MPI process

starting mdrun 'LYSOZYME in water'
50 steps,   1000.0 ps (continuing from step 54500,109.0 ps).



I got a mismatch note as described in the output file above. it is not a
problem, isn't it ?
I just want to make sure.



This is the message I mentioned in my first reply.  It just means you're now 
changing the DD configuration, PME nodes, etc. so it's not binary identical, but 
the state is faithfully preserved.


http://www.gromacs.org/Documentation/How-tos/Extending_Simulations#Exact_vs_binary_identical_continuation


is it not allowed to use a different user when we restart simulation from
checkpoint file ?
Previously, I failed to restart simulation based on checkpoint file. I
guess, it is failed because I used a different user (Only a guess).


Presumably one just needs correct read/write permissions, though I have never 
tried to switch users when doing a continuation.


-Justin

--
==

Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalem...@outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul

==
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-14 Thread Husen R
Hi,

Currently I'm running this tutorial (
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/08_MD.html)
to simulate restart with fewer nodes.
at restart, I changed the amount of nodes from 3 to 2 nodes.
I also changed the amount of processes from 24 to 16 processes.

While the application is running, I tried to see the output file.
This is the content of the output file :

#output file


Reading checkpoint file md_0_1.cpt generated: Sat May 14 13:10:25 2016

  #ranks mismatch,
current program: 16
checkpoint file: 24

  #PME-ranks mismatch,
current program: -1
checkpoint file: 6

GROMACS patchlevel, binary or parallel settings differ from previous run.
Continuation is exact, but not guaranteed to be binary identical.

Using 16 MPI processes
Using 1 OpenMP thread per MPI process

starting mdrun 'LYSOZYME in water'
50 steps,   1000.0 ps (continuing from step 54500,109.0 ps).



I got a mismatch note as described in the output file above. it is not a
problem, isn't it ?
I just want to make sure.

is it not allowed to use a different user when we restart simulation from
checkpoint file ?
Previously, I failed to restart simulation based on checkpoint file. I
guess, it is failed because I used a different user (Only a guess).
Thank you in advance.

regards,

Husen




On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul  wrote:

>
>
> On 5/13/16 8:53 PM, Husen R wrote:
>
>> Dear all
>>
>> Does simulation able to be restarted from checkpoint file with fewer
>> nodes ?
>> let's say, at the first time, I run simulation with 3 nodes. At running
>> time, one of those nodes is crashed and the simulation is terminated.
>>
>> I want to restart that simulation immadiately based on checkpoint file
>> with
>> the remaining 2 nodes. does gromacs support such case ?
>> I need help.
>>
>
> Have you tried it?  It should work.  You will probably get a note about
> the continuation not being exact due to a change in the number of cores,
> but the run should proceed fine.
>
> -Justin
>
> --
> ==
>
> Justin A. Lemkul, Ph.D.
> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>
> Department of Pharmaceutical Sciences
> School of Pharmacy
> Health Sciences Facility II, Room 629
> University of Maryland, Baltimore
> 20 Penn St.
> Baltimore, MD 21201
>
> jalem...@outerbanks.umaryland.edu | (410) 706-7441
> http://mackerell.umaryland.edu/~jalemkul
>
> ==
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-13 Thread Husen R
thanks a lot for your fast response.

I have tried it, and it failed. I ask in this forum just to make sure.
However, there was something in my cluster that probably make it failed.
I'll handle it first and then retry to restart again.

Regards,

Husen

On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul  wrote:

>
>
> On 5/13/16 8:53 PM, Husen R wrote:
>
>> Dear all
>>
>> Does simulation able to be restarted from checkpoint file with fewer
>> nodes ?
>> let's say, at the first time, I run simulation with 3 nodes. At running
>> time, one of those nodes is crashed and the simulation is terminated.
>>
>> I want to restart that simulation immadiately based on checkpoint file
>> with
>> the remaining 2 nodes. does gromacs support such case ?
>> I need help.
>>
>
> Have you tried it?  It should work.  You will probably get a note about
> the continuation not being exact due to a change in the number of cores,
> but the run should proceed fine.
>
> -Justin
>
> --
> ==
>
> Justin A. Lemkul, Ph.D.
> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>
> Department of Pharmaceutical Sciences
> School of Pharmacy
> Health Sciences Facility II, Room 629
> University of Maryland, Baltimore
> 20 Penn St.
> Baltimore, MD 21201
>
> jalem...@outerbanks.umaryland.edu | (410) 706-7441
> http://mackerell.umaryland.edu/~jalemkul
>
> ==
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-13 Thread Justin Lemkul



On 5/13/16 8:53 PM, Husen R wrote:

Dear all

Does simulation able to be restarted from checkpoint file with fewer nodes ?
let's say, at the first time, I run simulation with 3 nodes. At running
time, one of those nodes is crashed and the simulation is terminated.

I want to restart that simulation immadiately based on checkpoint file with
the remaining 2 nodes. does gromacs support such case ?
I need help.


Have you tried it?  It should work.  You will probably get a note about the 
continuation not being exact due to a change in the number of cores, but the run 
should proceed fine.


-Justin

--
==

Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalem...@outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul

==
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-13 Thread Husen R
I use Gromacs-5.1.2 and SLURM-15.08.10 as a resource manager.

On Sat, May 14, 2016 at 7:53 AM, Husen R  wrote:

> Dear all
>
> Does simulation able to be restarted from checkpoint file with fewer nodes
> ?
> let's say, at the first time, I run simulation with 3 nodes. At running
> time, one of those nodes is crashed and the simulation is terminated.
>
> I want to restart that simulation immadiately based on checkpoint file
> with the remaining 2 nodes. does gromacs support such case ?
> I need help.
>
> Thank you in advance.
> Regards,
>
> Husen
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] Restart simulation from checkpoint file with fewer nodes

2016-05-13 Thread Husen R
Dear all

Does simulation able to be restarted from checkpoint file with fewer nodes ?
let's say, at the first time, I run simulation with 3 nodes. At running
time, one of those nodes is crashed and the simulation is terminated.

I want to restart that simulation immadiately based on checkpoint file with
the remaining 2 nodes. does gromacs support such case ?
I need help.

Thank you in advance.
Regards,

Husen
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.