Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
Hi all, After spending time for troubleshooting, I found that gromacs checkpoint/restart feature is working well. The failure occurred because I use root user to submit restart job (using slurm resource manager). After switching to non root user, the restart process is running. The reason why I use root user is because I run this job in bash scripting and execute it at designated time using Cron. I know this is not the right place to talk about slurm. Thank you for your reply ! Regards, Husen On Sun, May 15, 2016 at 8:20 PM,wrote: > ok thanks > > > Hi, > > > > Yes, that's one way to work around the problem. In some places, a module > > subsystem can be used to take care of the selection automatically, but > you > > don't want to set one up for just you to use. > > > > Mark > > > > On Sun, May 15, 2016 at 11:48 AM wrote: > > > >> Thanks Mark, > >> > >> My sysadmins have let me install my own GROMACS versions and have not > >> informed me of any such mechanism. Would you suggest I qrsh into a node > >> of > >> each type and build an mdrun-only version on each? I'd then select a > >> particular node type for a submit script with the relevant mdrun. > >> > >> Many thanks > >> James > >> > >> > Hi, > >> > > >> > On Sat, May 14, 2016 at 1:09 PM wrote: > >> > > >> >> In case it's relevant/interesting to anyone, here are the details on > >> our > >> >> cluster nodes: > >> >> > >> >> nodes # model # cores cpu > >> >> model > >> >> RAM node_type > >> >> fmb01 - fmb33 33 IBM HS21XM 8 3 GHz > >> >> Xeon > >> >> E5450 > >> >> 16GB hs21 > >> >> fmb34 - fmb42 9 IBM HS228 2.4 > >> GHz > >> >> Xeon E5530 > >> >> 16GBhs22 > >> >> fmb43 - fmb88 45 Dell PE M6108 2.4 > >> GHz > >> >> Xeon E5530 > >> >> 16GB m610 > >> >> fmb88 - fmb90 3 Dell PE M610+ 12 3.4 > >> GHz > >> >> Xeon X5690 > >> >> 48GB m610+ > >> >> fmb91 - fmb202 112 Dell PE M62024 (HT) 2.9 > >> GHz > >> >> Xeon E5-2667 > >> >>64GB m620 > >> >> fmb203 - fmb279 77 Dell PE M62024 (HT) 3.5 > >> GHz > >> >> Xeon E5-2643 v2 64GB > >> >> m620+ > >> >> fmb280 - fmb359 80 Dell PE M63024 (HT) 3.4 > >> GHz > >> >> Xeon E5-2643 v3 64GB > >> >> m630 > >> >> > >> >> I could only run GROMACS 4.6.2 on the last three node types and I > >> >> believe > >> >> the same is true for 5.0.4 > >> >> > >> > > >> > Sure. GROMACS is designed to target whichever hardware was selected at > >> > configure time, which your sysadmins for such a heterogeneous cluster > >> > should have documented somewhere. They should also be making available > >> to > >> > you a mechanism to target your jobs to nodes where they can run > >> programs > >> > that use the hardware efficiently, or providing GROMACS installations > >> that > >> > work regardless of which node you are actually on. You might like to > >> > respectfully remind them of the things we say at > >> > > >> > http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects > >> > (These thoughts are common to earlier versions also.) > >> > > >> > Mark > >> > > >> > > >> > Best wishes > >> >> James > >> >> > >> >> > I have found that only some kinds of nodes on our cluster work for > >> >> gromacs > >> >> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can > >> >> check > >> >> > the details tomorrow). I haven't tested it again now I'm using 5.0 > >> so > >> >> > don't know if that's still an issue but if it is it could explain > >> why > >> >> your > >> >> > restart failed even and the initial run didn't. > >> >> > > >> >> >> thanks a lot for your fast response. > >> >> >> > >> >> >> I have tried it, and it failed. I ask in this forum just to make > >> >> sure. > >> >> >> However, there was something in my cluster that probably make it > >> >> failed. > >> >> >> I'll handle it first and then retry to restart again. > >> >> >> > >> >> >> Regards, > >> >> >> > >> >> >> Husen > >> >> >> > >> >> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul > >> >> wrote: > >> >> >> > >> >> >>> > >> >> >>> > >> >> >>> On 5/13/16 8:53 PM, Husen R wrote: > >> >> >>> > >> >> Dear all > >> >> > >> >> Does simulation able to be restarted from checkpoint file with > >> >> fewer > >> >> nodes ? > >> >> let's say, at the first time, I run simulation with 3 nodes. At > >> >> running > >> >> time, one of those nodes is crashed and the simulation is > >> >> terminated. > >> >> > >> >> I want to restart that simulation immadiately based on > >> checkpoint > >> >> file > >> >> with > >> >> the remaining 2 nodes. does gromacs support such case
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
ok thanks > Hi, > > Yes, that's one way to work around the problem. In some places, a module > subsystem can be used to take care of the selection automatically, but you > don't want to set one up for just you to use. > > Mark > > On Sun, May 15, 2016 at 11:48 AMwrote: > >> Thanks Mark, >> >> My sysadmins have let me install my own GROMACS versions and have not >> informed me of any such mechanism. Would you suggest I qrsh into a node >> of >> each type and build an mdrun-only version on each? I'd then select a >> particular node type for a submit script with the relevant mdrun. >> >> Many thanks >> James >> >> > Hi, >> > >> > On Sat, May 14, 2016 at 1:09 PM wrote: >> > >> >> In case it's relevant/interesting to anyone, here are the details on >> our >> >> cluster nodes: >> >> >> >> nodes # model # cores cpu >> >> model >> >> RAM node_type >> >> fmb01 - fmb33 33 IBM HS21XM 8 3 GHz >> >> Xeon >> >> E5450 >> >> 16GB hs21 >> >> fmb34 - fmb42 9 IBM HS228 2.4 >> GHz >> >> Xeon E5530 >> >> 16GBhs22 >> >> fmb43 - fmb88 45 Dell PE M6108 2.4 >> GHz >> >> Xeon E5530 >> >> 16GB m610 >> >> fmb88 - fmb90 3 Dell PE M610+ 12 3.4 >> GHz >> >> Xeon X5690 >> >> 48GB m610+ >> >> fmb91 - fmb202 112 Dell PE M62024 (HT) 2.9 >> GHz >> >> Xeon E5-2667 >> >>64GB m620 >> >> fmb203 - fmb279 77 Dell PE M62024 (HT) 3.5 >> GHz >> >> Xeon E5-2643 v2 64GB >> >> m620+ >> >> fmb280 - fmb359 80 Dell PE M63024 (HT) 3.4 >> GHz >> >> Xeon E5-2643 v3 64GB >> >> m630 >> >> >> >> I could only run GROMACS 4.6.2 on the last three node types and I >> >> believe >> >> the same is true for 5.0.4 >> >> >> > >> > Sure. GROMACS is designed to target whichever hardware was selected at >> > configure time, which your sysadmins for such a heterogeneous cluster >> > should have documented somewhere. They should also be making available >> to >> > you a mechanism to target your jobs to nodes where they can run >> programs >> > that use the hardware efficiently, or providing GROMACS installations >> that >> > work regardless of which node you are actually on. You might like to >> > respectfully remind them of the things we say at >> > >> http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects >> > (These thoughts are common to earlier versions also.) >> > >> > Mark >> > >> > >> > Best wishes >> >> James >> >> >> >> > I have found that only some kinds of nodes on our cluster work for >> >> gromacs >> >> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can >> >> check >> >> > the details tomorrow). I haven't tested it again now I'm using 5.0 >> so >> >> > don't know if that's still an issue but if it is it could explain >> why >> >> your >> >> > restart failed even and the initial run didn't. >> >> > >> >> >> thanks a lot for your fast response. >> >> >> >> >> >> I have tried it, and it failed. I ask in this forum just to make >> >> sure. >> >> >> However, there was something in my cluster that probably make it >> >> failed. >> >> >> I'll handle it first and then retry to restart again. >> >> >> >> >> >> Regards, >> >> >> >> >> >> Husen >> >> >> >> >> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul >> >> wrote: >> >> >> >> >> >>> >> >> >>> >> >> >>> On 5/13/16 8:53 PM, Husen R wrote: >> >> >>> >> >> Dear all >> >> >> >> Does simulation able to be restarted from checkpoint file with >> >> fewer >> >> nodes ? >> >> let's say, at the first time, I run simulation with 3 nodes. At >> >> running >> >> time, one of those nodes is crashed and the simulation is >> >> terminated. >> >> >> >> I want to restart that simulation immadiately based on >> checkpoint >> >> file >> >> with >> >> the remaining 2 nodes. does gromacs support such case ? >> >> I need help. >> >> >> >> >>> >> >> >>> Have you tried it? It should work. You will probably get a note >> >> about >> >> >>> the continuation not being exact due to a change in the number of >> >> >>> cores, >> >> >>> but the run should proceed fine. >> >> >>> >> >> >>> -Justin >> >> >>> >> >> >>> -- >> >> >>> == >> >> >>> >> >> >>> Justin A. Lemkul, Ph.D. >> >> >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow >> >> >>> >> >> >>> Department of Pharmaceutical Sciences >> >> >>> School of Pharmacy >> >> >>> Health Sciences Facility II, Room 629 >> >> >>> University of Maryland, Baltimore >> >> >>> 20 Penn St. >> >> >>> Baltimore, MD 21201 >> >> >>> >> >> >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441 >> >> >>> http://mackerell.umaryland.edu/~jalemkul >> >>
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
Hi, Yes, that's one way to work around the problem. In some places, a module subsystem can be used to take care of the selection automatically, but you don't want to set one up for just you to use. Mark On Sun, May 15, 2016 at 11:48 AMwrote: > Thanks Mark, > > My sysadmins have let me install my own GROMACS versions and have not > informed me of any such mechanism. Would you suggest I qrsh into a node of > each type and build an mdrun-only version on each? I'd then select a > particular node type for a submit script with the relevant mdrun. > > Many thanks > James > > > Hi, > > > > On Sat, May 14, 2016 at 1:09 PM wrote: > > > >> In case it's relevant/interesting to anyone, here are the details on our > >> cluster nodes: > >> > >> nodes # model # cores cpu > >> model > >> RAM node_type > >> fmb01 - fmb33 33 IBM HS21XM 8 3 GHz > >> Xeon > >> E5450 > >> 16GB hs21 > >> fmb34 - fmb42 9 IBM HS228 2.4 GHz > >> Xeon E5530 > >> 16GBhs22 > >> fmb43 - fmb88 45 Dell PE M6108 2.4 GHz > >> Xeon E5530 > >> 16GB m610 > >> fmb88 - fmb90 3 Dell PE M610+ 12 3.4 GHz > >> Xeon X5690 > >> 48GB m610+ > >> fmb91 - fmb202 112 Dell PE M62024 (HT) 2.9 GHz > >> Xeon E5-2667 > >>64GB m620 > >> fmb203 - fmb279 77 Dell PE M62024 (HT) 3.5 GHz > >> Xeon E5-2643 v2 64GB > >> m620+ > >> fmb280 - fmb359 80 Dell PE M63024 (HT) 3.4 GHz > >> Xeon E5-2643 v3 64GB > >> m630 > >> > >> I could only run GROMACS 4.6.2 on the last three node types and I > >> believe > >> the same is true for 5.0.4 > >> > > > > Sure. GROMACS is designed to target whichever hardware was selected at > > configure time, which your sysadmins for such a heterogeneous cluster > > should have documented somewhere. They should also be making available to > > you a mechanism to target your jobs to nodes where they can run programs > > that use the hardware efficiently, or providing GROMACS installations > that > > work regardless of which node you are actually on. You might like to > > respectfully remind them of the things we say at > > > http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects > > (These thoughts are common to earlier versions also.) > > > > Mark > > > > > > Best wishes > >> James > >> > >> > I have found that only some kinds of nodes on our cluster work for > >> gromacs > >> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can > >> check > >> > the details tomorrow). I haven't tested it again now I'm using 5.0 so > >> > don't know if that's still an issue but if it is it could explain why > >> your > >> > restart failed even and the initial run didn't. > >> > > >> >> thanks a lot for your fast response. > >> >> > >> >> I have tried it, and it failed. I ask in this forum just to make > >> sure. > >> >> However, there was something in my cluster that probably make it > >> failed. > >> >> I'll handle it first and then retry to restart again. > >> >> > >> >> Regards, > >> >> > >> >> Husen > >> >> > >> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul > >> wrote: > >> >> > >> >>> > >> >>> > >> >>> On 5/13/16 8:53 PM, Husen R wrote: > >> >>> > >> Dear all > >> > >> Does simulation able to be restarted from checkpoint file with > >> fewer > >> nodes ? > >> let's say, at the first time, I run simulation with 3 nodes. At > >> running > >> time, one of those nodes is crashed and the simulation is > >> terminated. > >> > >> I want to restart that simulation immadiately based on checkpoint > >> file > >> with > >> the remaining 2 nodes. does gromacs support such case ? > >> I need help. > >> > >> >>> > >> >>> Have you tried it? It should work. You will probably get a note > >> about > >> >>> the continuation not being exact due to a change in the number of > >> >>> cores, > >> >>> but the run should proceed fine. > >> >>> > >> >>> -Justin > >> >>> > >> >>> -- > >> >>> == > >> >>> > >> >>> Justin A. Lemkul, Ph.D. > >> >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow > >> >>> > >> >>> Department of Pharmaceutical Sciences > >> >>> School of Pharmacy > >> >>> Health Sciences Facility II, Room 629 > >> >>> University of Maryland, Baltimore > >> >>> 20 Penn St. > >> >>> Baltimore, MD 21201 > >> >>> > >> >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441 > >> >>> http://mackerell.umaryland.edu/~jalemkul > >> >>> > >> >>> == > >> >>> -- > >> >>> Gromacs Users mailing list > >> >>> > >> >>> * Please search the archive at > >> >>>
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
Thanks Mark, My sysadmins have let me install my own GROMACS versions and have not informed me of any such mechanism. Would you suggest I qrsh into a node of each type and build an mdrun-only version on each? I'd then select a particular node type for a submit script with the relevant mdrun. Many thanks James > Hi, > > On Sat, May 14, 2016 at 1:09 PMwrote: > >> In case it's relevant/interesting to anyone, here are the details on our >> cluster nodes: >> >> nodes # model # cores cpu >> model >> RAM node_type >> fmb01 - fmb33 33 IBM HS21XM 8 3 GHz >> Xeon >> E5450 >> 16GB hs21 >> fmb34 - fmb42 9 IBM HS228 2.4 GHz >> Xeon E5530 >> 16GBhs22 >> fmb43 - fmb88 45 Dell PE M6108 2.4 GHz >> Xeon E5530 >> 16GB m610 >> fmb88 - fmb90 3 Dell PE M610+ 12 3.4 GHz >> Xeon X5690 >> 48GB m610+ >> fmb91 - fmb202 112 Dell PE M62024 (HT) 2.9 GHz >> Xeon E5-2667 >>64GB m620 >> fmb203 - fmb279 77 Dell PE M62024 (HT) 3.5 GHz >> Xeon E5-2643 v2 64GB >> m620+ >> fmb280 - fmb359 80 Dell PE M63024 (HT) 3.4 GHz >> Xeon E5-2643 v3 64GB >> m630 >> >> I could only run GROMACS 4.6.2 on the last three node types and I >> believe >> the same is true for 5.0.4 >> > > Sure. GROMACS is designed to target whichever hardware was selected at > configure time, which your sysadmins for such a heterogeneous cluster > should have documented somewhere. They should also be making available to > you a mechanism to target your jobs to nodes where they can run programs > that use the hardware efficiently, or providing GROMACS installations that > work regardless of which node you are actually on. You might like to > respectfully remind them of the things we say at > http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects > (These thoughts are common to earlier versions also.) > > Mark > > > Best wishes >> James >> >> > I have found that only some kinds of nodes on our cluster work for >> gromacs >> > 4.6 (the ones we call m620, m620+ and m630 but not others - I can >> check >> > the details tomorrow). I haven't tested it again now I'm using 5.0 so >> > don't know if that's still an issue but if it is it could explain why >> your >> > restart failed even and the initial run didn't. >> > >> >> thanks a lot for your fast response. >> >> >> >> I have tried it, and it failed. I ask in this forum just to make >> sure. >> >> However, there was something in my cluster that probably make it >> failed. >> >> I'll handle it first and then retry to restart again. >> >> >> >> Regards, >> >> >> >> Husen >> >> >> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul >> wrote: >> >> >> >>> >> >>> >> >>> On 5/13/16 8:53 PM, Husen R wrote: >> >>> >> Dear all >> >> Does simulation able to be restarted from checkpoint file with >> fewer >> nodes ? >> let's say, at the first time, I run simulation with 3 nodes. At >> running >> time, one of those nodes is crashed and the simulation is >> terminated. >> >> I want to restart that simulation immadiately based on checkpoint >> file >> with >> the remaining 2 nodes. does gromacs support such case ? >> I need help. >> >> >>> >> >>> Have you tried it? It should work. You will probably get a note >> about >> >>> the continuation not being exact due to a change in the number of >> >>> cores, >> >>> but the run should proceed fine. >> >>> >> >>> -Justin >> >>> >> >>> -- >> >>> == >> >>> >> >>> Justin A. Lemkul, Ph.D. >> >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow >> >>> >> >>> Department of Pharmaceutical Sciences >> >>> School of Pharmacy >> >>> Health Sciences Facility II, Room 629 >> >>> University of Maryland, Baltimore >> >>> 20 Penn St. >> >>> Baltimore, MD 21201 >> >>> >> >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441 >> >>> http://mackerell.umaryland.edu/~jalemkul >> >>> >> >>> == >> >>> -- >> >>> Gromacs Users mailing list >> >>> >> >>> * Please search the archive at >> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >> >>> posting! >> >>> >> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >>> >> >>> * For (un)subscribe requests visit >> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users >> or >> >>> send a mail to gmx-users-requ...@gromacs.org. >> >>> >> >> -- >> >> Gromacs Users mailing list >> >> >> >> * Please search the archive at >> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >> >> posting! >> >> >> >> * Can't post? Read
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
Hi, On Sat, May 14, 2016 at 1:09 PMwrote: > In case it's relevant/interesting to anyone, here are the details on our > cluster nodes: > > nodes # model # cores cpu model > RAM node_type > fmb01 - fmb33 33 IBM HS21XM 8 3 GHz Xeon > E5450 > 16GB hs21 > fmb34 - fmb42 9 IBM HS228 2.4 GHz > Xeon E5530 > 16GBhs22 > fmb43 - fmb88 45 Dell PE M6108 2.4 GHz > Xeon E5530 > 16GB m610 > fmb88 - fmb90 3 Dell PE M610+ 12 3.4 GHz > Xeon X5690 > 48GB m610+ > fmb91 - fmb202 112 Dell PE M62024 (HT) 2.9 GHz > Xeon E5-2667 >64GB m620 > fmb203 - fmb279 77 Dell PE M62024 (HT) 3.5 GHz > Xeon E5-2643 v2 64GB > m620+ > fmb280 - fmb359 80 Dell PE M63024 (HT) 3.4 GHz > Xeon E5-2643 v3 64GB > m630 > > I could only run GROMACS 4.6.2 on the last three node types and I believe > the same is true for 5.0.4 > Sure. GROMACS is designed to target whichever hardware was selected at configure time, which your sysadmins for such a heterogeneous cluster should have documented somewhere. They should also be making available to you a mechanism to target your jobs to nodes where they can run programs that use the hardware efficiently, or providing GROMACS installations that work regardless of which node you are actually on. You might like to respectfully remind them of the things we say at http://manual.gromacs.org/documentation/5.1.2/install-guide/index.html#portability-aspects (These thoughts are common to earlier versions also.) Mark Best wishes > James > > > I have found that only some kinds of nodes on our cluster work for > gromacs > > 4.6 (the ones we call m620, m620+ and m630 but not others - I can check > > the details tomorrow). I haven't tested it again now I'm using 5.0 so > > don't know if that's still an issue but if it is it could explain why > your > > restart failed even and the initial run didn't. > > > >> thanks a lot for your fast response. > >> > >> I have tried it, and it failed. I ask in this forum just to make sure. > >> However, there was something in my cluster that probably make it failed. > >> I'll handle it first and then retry to restart again. > >> > >> Regards, > >> > >> Husen > >> > >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkul wrote: > >> > >>> > >>> > >>> On 5/13/16 8:53 PM, Husen R wrote: > >>> > Dear all > > Does simulation able to be restarted from checkpoint file with fewer > nodes ? > let's say, at the first time, I run simulation with 3 nodes. At > running > time, one of those nodes is crashed and the simulation is terminated. > > I want to restart that simulation immadiately based on checkpoint file > with > the remaining 2 nodes. does gromacs support such case ? > I need help. > > >>> > >>> Have you tried it? It should work. You will probably get a note about > >>> the continuation not being exact due to a change in the number of > >>> cores, > >>> but the run should proceed fine. > >>> > >>> -Justin > >>> > >>> -- > >>> == > >>> > >>> Justin A. Lemkul, Ph.D. > >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow > >>> > >>> Department of Pharmaceutical Sciences > >>> School of Pharmacy > >>> Health Sciences Facility II, Room 629 > >>> University of Maryland, Baltimore > >>> 20 Penn St. > >>> Baltimore, MD 21201 > >>> > >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441 > >>> http://mackerell.umaryland.edu/~jalemkul > >>> > >>> == > >>> -- > >>> Gromacs Users mailing list > >>> > >>> * Please search the archive at > >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > >>> posting! > >>> > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > >>> > >>> * For (un)subscribe requests visit > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > >>> send a mail to gmx-users-requ...@gromacs.org. > >>> > >> -- > >> Gromacs Users mailing list > >> > >> * Please search the archive at > >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > >> posting! > >> > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > >> > >> * For (un)subscribe requests visit > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > >> send > >> a mail to gmx-users-requ...@gromacs.org. > >> > > > > > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > >
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
In case it's relevant/interesting to anyone, here are the details on our cluster nodes: nodes # model # cores cpu model RAM node_type fmb01 - fmb33 33 IBM HS21XM 8 3 GHz Xeon E5450 16GB hs21 fmb34 - fmb42 9 IBM HS228 2.4 GHz Xeon E5530 16GBhs22 fmb43 - fmb88 45 Dell PE M6108 2.4 GHz Xeon E5530 16GB m610 fmb88 - fmb90 3 Dell PE M610+ 12 3.4 GHz Xeon X5690 48GB m610+ fmb91 - fmb202 112 Dell PE M62024 (HT) 2.9 GHz Xeon E5-2667 64GB m620 fmb203 - fmb279 77 Dell PE M62024 (HT) 3.5 GHz Xeon E5-2643 v2 64GB m620+ fmb280 - fmb359 80 Dell PE M63024 (HT) 3.4 GHz Xeon E5-2643 v3 64GB m630 I could only run GROMACS 4.6.2 on the last three node types and I believe the same is true for 5.0.4 Best wishes James > I have found that only some kinds of nodes on our cluster work for gromacs > 4.6 (the ones we call m620, m620+ and m630 but not others - I can check > the details tomorrow). I haven't tested it again now I'm using 5.0 so > don't know if that's still an issue but if it is it could explain why your > restart failed even and the initial run didn't. > >> thanks a lot for your fast response. >> >> I have tried it, and it failed. I ask in this forum just to make sure. >> However, there was something in my cluster that probably make it failed. >> I'll handle it first and then retry to restart again. >> >> Regards, >> >> Husen >> >> On Sat, May 14, 2016 at 7:58 AM, Justin Lemkulwrote: >> >>> >>> >>> On 5/13/16 8:53 PM, Husen R wrote: >>> Dear all Does simulation able to be restarted from checkpoint file with fewer nodes ? let's say, at the first time, I run simulation with 3 nodes. At running time, one of those nodes is crashed and the simulation is terminated. I want to restart that simulation immadiately based on checkpoint file with the remaining 2 nodes. does gromacs support such case ? I need help. >>> >>> Have you tried it? It should work. You will probably get a note about >>> the continuation not being exact due to a change in the number of >>> cores, >>> but the run should proceed fine. >>> >>> -Justin >>> >>> -- >>> == >>> >>> Justin A. Lemkul, Ph.D. >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow >>> >>> Department of Pharmaceutical Sciences >>> School of Pharmacy >>> Health Sciences Facility II, Room 629 >>> University of Maryland, Baltimore >>> 20 Penn St. >>> Baltimore, MD 21201 >>> >>> jalem...@outerbanks.umaryland.edu | (410) 706-7441 >>> http://mackerell.umaryland.edu/~jalemkul >>> >>> == >>> -- >>> Gromacs Users mailing list >>> >>> * Please search the archive at >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >>> posting! >>> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >>> * For (un)subscribe requests visit >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> send a mail to gmx-users-requ...@gromacs.org. >>> >> -- >> Gromacs Users mailing list >> >> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >> posting! >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> * For (un)subscribe requests visit >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >> send >> a mail to gmx-users-requ...@gromacs.org. >> > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send > a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
On 5/14/16 3:48 AM, Husen R wrote: Hi, Currently I'm running this tutorial ( http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/08_MD.html) to simulate restart with fewer nodes. at restart, I changed the amount of nodes from 3 to 2 nodes. I also changed the amount of processes from 24 to 16 processes. While the application is running, I tried to see the output file. This is the content of the output file : #output file Reading checkpoint file md_0_1.cpt generated: Sat May 14 13:10:25 2016 #ranks mismatch, current program: 16 checkpoint file: 24 #PME-ranks mismatch, current program: -1 checkpoint file: 6 GROMACS patchlevel, binary or parallel settings differ from previous run. Continuation is exact, but not guaranteed to be binary identical. Using 16 MPI processes Using 1 OpenMP thread per MPI process starting mdrun 'LYSOZYME in water' 50 steps, 1000.0 ps (continuing from step 54500,109.0 ps). I got a mismatch note as described in the output file above. it is not a problem, isn't it ? I just want to make sure. This is the message I mentioned in my first reply. It just means you're now changing the DD configuration, PME nodes, etc. so it's not binary identical, but the state is faithfully preserved. http://www.gromacs.org/Documentation/How-tos/Extending_Simulations#Exact_vs_binary_identical_continuation is it not allowed to use a different user when we restart simulation from checkpoint file ? Previously, I failed to restart simulation based on checkpoint file. I guess, it is failed because I used a different user (Only a guess). Presumably one just needs correct read/write permissions, though I have never tried to switch users when doing a continuation. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 629 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
Hi, Currently I'm running this tutorial ( http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/08_MD.html) to simulate restart with fewer nodes. at restart, I changed the amount of nodes from 3 to 2 nodes. I also changed the amount of processes from 24 to 16 processes. While the application is running, I tried to see the output file. This is the content of the output file : #output file Reading checkpoint file md_0_1.cpt generated: Sat May 14 13:10:25 2016 #ranks mismatch, current program: 16 checkpoint file: 24 #PME-ranks mismatch, current program: -1 checkpoint file: 6 GROMACS patchlevel, binary or parallel settings differ from previous run. Continuation is exact, but not guaranteed to be binary identical. Using 16 MPI processes Using 1 OpenMP thread per MPI process starting mdrun 'LYSOZYME in water' 50 steps, 1000.0 ps (continuing from step 54500,109.0 ps). I got a mismatch note as described in the output file above. it is not a problem, isn't it ? I just want to make sure. is it not allowed to use a different user when we restart simulation from checkpoint file ? Previously, I failed to restart simulation based on checkpoint file. I guess, it is failed because I used a different user (Only a guess). Thank you in advance. regards, Husen On Sat, May 14, 2016 at 7:58 AM, Justin Lemkulwrote: > > > On 5/13/16 8:53 PM, Husen R wrote: > >> Dear all >> >> Does simulation able to be restarted from checkpoint file with fewer >> nodes ? >> let's say, at the first time, I run simulation with 3 nodes. At running >> time, one of those nodes is crashed and the simulation is terminated. >> >> I want to restart that simulation immadiately based on checkpoint file >> with >> the remaining 2 nodes. does gromacs support such case ? >> I need help. >> > > Have you tried it? It should work. You will probably get a note about > the continuation not being exact due to a change in the number of cores, > but the run should proceed fine. > > -Justin > > -- > == > > Justin A. Lemkul, Ph.D. > Ruth L. Kirschstein NRSA Postdoctoral Fellow > > Department of Pharmaceutical Sciences > School of Pharmacy > Health Sciences Facility II, Room 629 > University of Maryland, Baltimore > 20 Penn St. > Baltimore, MD 21201 > > jalem...@outerbanks.umaryland.edu | (410) 706-7441 > http://mackerell.umaryland.edu/~jalemkul > > == > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
thanks a lot for your fast response. I have tried it, and it failed. I ask in this forum just to make sure. However, there was something in my cluster that probably make it failed. I'll handle it first and then retry to restart again. Regards, Husen On Sat, May 14, 2016 at 7:58 AM, Justin Lemkulwrote: > > > On 5/13/16 8:53 PM, Husen R wrote: > >> Dear all >> >> Does simulation able to be restarted from checkpoint file with fewer >> nodes ? >> let's say, at the first time, I run simulation with 3 nodes. At running >> time, one of those nodes is crashed and the simulation is terminated. >> >> I want to restart that simulation immadiately based on checkpoint file >> with >> the remaining 2 nodes. does gromacs support such case ? >> I need help. >> > > Have you tried it? It should work. You will probably get a note about > the continuation not being exact due to a change in the number of cores, > but the run should proceed fine. > > -Justin > > -- > == > > Justin A. Lemkul, Ph.D. > Ruth L. Kirschstein NRSA Postdoctoral Fellow > > Department of Pharmaceutical Sciences > School of Pharmacy > Health Sciences Facility II, Room 629 > University of Maryland, Baltimore > 20 Penn St. > Baltimore, MD 21201 > > jalem...@outerbanks.umaryland.edu | (410) 706-7441 > http://mackerell.umaryland.edu/~jalemkul > > == > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
On 5/13/16 8:53 PM, Husen R wrote: Dear all Does simulation able to be restarted from checkpoint file with fewer nodes ? let's say, at the first time, I run simulation with 3 nodes. At running time, one of those nodes is crashed and the simulation is terminated. I want to restart that simulation immadiately based on checkpoint file with the remaining 2 nodes. does gromacs support such case ? I need help. Have you tried it? It should work. You will probably get a note about the continuation not being exact due to a change in the number of cores, but the run should proceed fine. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 629 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Restart simulation from checkpoint file with fewer nodes
I use Gromacs-5.1.2 and SLURM-15.08.10 as a resource manager. On Sat, May 14, 2016 at 7:53 AM, Husen Rwrote: > Dear all > > Does simulation able to be restarted from checkpoint file with fewer nodes > ? > let's say, at the first time, I run simulation with 3 nodes. At running > time, one of those nodes is crashed and the simulation is terminated. > > I want to restart that simulation immadiately based on checkpoint file > with the remaining 2 nodes. does gromacs support such case ? > I need help. > > Thank you in advance. > Regards, > > Husen > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] Restart simulation from checkpoint file with fewer nodes
Dear all Does simulation able to be restarted from checkpoint file with fewer nodes ? let's say, at the first time, I run simulation with 3 nodes. At running time, one of those nodes is crashed and the simulation is terminated. I want to restart that simulation immadiately based on checkpoint file with the remaining 2 nodes. does gromacs support such case ? I need help. Thank you in advance. Regards, Husen -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.