Re: [gmx-users] REMD and distance restraints problem in gmx 4.6.7
Hi, On Fri, Sep 18, 2015 at 6:27 AM Christopher Neale < chris.ne...@alum.utoronto.ca> wrote: > Dear Users: > > I have a system with many distance restraints, designed to maintain > helical character, e.g.: > [ distance_restraints ] > 90 33 1 1 2 2.745541e-01 3.122595e-01 999 1.0 > 97 57 1 2 2 2.876300e-01 2.892921e-01 999 1.0 > 114 73 1 3 2 2.704403e-01 2.929642e-01 999 1.0 > ... > > Distance restraints are properly turned on in the .mdp file with: > disre=simple > disre-fc=1000 > > The run works fine on a single node (gmx 4.6.7 here and for all that > follows): > mdrun -nt 24 ... > > The run also works fine on two nodes: > ibrun -np 48 mdrun_mpi ... > > However, if I try to do temperature replica exchange (REMD), with two > replicas and two nodes like this: > ibrun -np 48 mdrun_mpi -multi 2 -replex 200 ... > > then I get the error message: > Fatal error: > Time or ensemble averaged or multiple pair distance restraints do not work > (yet) with domain decomposition, use particle decomposition (mdrun option > -pd) > Right. Unfortunately, the ensemble-restraints code dates from some of the very early days of GROMACS. It uses mdrun -multi and (IIRC) is hard-coded to be on if its topology and runtime conditions are satisfied. That is, you can't run non-ensemble distance restraints with normal mdrun -multi. So when REMD also uses mdrun -multi, things get confused. Aside: I tried particle decomposition, but when I do that without the REMD, > simply running the 48-core job that worked fine with domain decomposition, > I get LINCS errors and quickly a crash (note that without -pd I have 25 ns > and counting of run without error): > Step 0, time 0 (ps) LINCS WARNING in simulation 0 > relative constraint deviation after LINCS: > rms 5.774043, max 48.082966 (between atoms 21554 and 21555) > bonds that rotated more than 30 degrees: > atom 1 atom 2 angle previous, current, constraint length > ... > > So I am stuck with an error message that is not entirely helpful because > (a) the -pd option does not solve the issue even without REMD and also (b) > the issue seems to be related to REMD (because without REMD I can run on > multiple nodes) though that is not mentioned in the error message. > Yes, the ensemble restraints code is issuing that message. It doesn't know that REMD is a thing. > I note that Mark Abraham mentioned here: > http://redmine.gromacs.org/issues/1117 that: > "You can use MPI, you just can't have more than one domain (= MPI rank) > per simulation. For a multi-simulation with distance restraints and not > replica-exchange, you thus must have as many MPI ranks as simulations, so > that each simulation has one rank and thus one domain." > > I have trouble interpreting this, as I have always thought that running > MPI across multiple nodes requires multiple domains (apparently = MPI > ranks), so I am confused as to why that is possible without REMD but gets > messy with REMD. > I'm not sure why I mentioned REMD, but the topic there is ensemble restraints. > Final note: I am not trying to do "Time or ensemble averaged" distance > restraints, and I think that I am not trying to do "multiple pair distance > restraints", unless that simply means having more than one simple distance > restraint. So at the very least I think that the error message that I get > is confusing. > Unfortunately that's thanks to the magic helpfulness of the feature turning itself on (IIRC). Your setting of type'=2 would probably stop the feature doing anything, but the check doesn't know that. > If the solution or source of error is obvious then sorry.. maybe I just > don't get MPI well enough. > No, you understand well enough. Some of the code is not good enough any more. It occurs to me now that a one line hack that does "oh, so you're running mdrun -replex? you clearly don't want ensemble restraints" might work. I'll see what I can find (probably not before Monday). Mark Thank you for your suggestions, > Chris. > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] REMD and distance restraints problem in gmx 4.6.7
Dear Mark: you are correct. If I get rid of the gmx_fatal call then everything seems to work just fine. In the file src/gmxlib/disre.c , I got rid of the following code, which starts on line 147 of gmx 4.6.7: if (dd->dr_tau != 0 || ir->eDisre == edrEnsemble || cr->ms != NULL || dd->nres != dd->npair) { gmx_fatal(FARGS, "Time or ensemble averaged or multiple pair distance restraints do not work (yet) with domain decomposition, use particle decomposition (mdrun option -pd)"); } I tested by taking the temperature way up and running with and without the distance restraints in a 2-replica REMD simulation. It's a short test and I'll report back if other issues come up later, but for now things seem to be going as desired. Note 1: it's the "cr->ms != NULL" condition that leads to the gmx_fatal call when using -multi -replex and distance restraints. Note 2: somebody has clearly already thought about this because at line 191 of the same file (src/gmxlib/disre.c), REMD is taken into account: if (cr && cr->ms != NULL && ptr != NULL && !bIsREMD) Note 3: using -multidir instead of -multi does not on its own solve the issue that I originally reported Thank you for your help Chris. From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se <gromacs.org_gmx-users-boun...@maillist.sys.kth.se> on behalf of Mark Abraham <mark.j.abra...@gmail.com> Sent: 18 September 2015 04:11 To: gmx-us...@gromacs.org Subject: Re: [gmx-users] REMD and distance restraints problem in gmx 4.6.7 Hi, On Fri, Sep 18, 2015 at 6:27 AM Christopher Neale < chris.ne...@alum.utoronto.ca> wrote: > Dear Users: > > I have a system with many distance restraints, designed to maintain > helical character, e.g.: > [ distance_restraints ] > 90 33 1 1 2 2.745541e-01 3.122595e-01 999 1.0 > 97 57 1 2 2 2.876300e-01 2.892921e-01 999 1.0 > 114 73 1 3 2 2.704403e-01 2.929642e-01 999 1.0 > ... > > Distance restraints are properly turned on in the .mdp file with: > disre=simple > disre-fc=1000 > > The run works fine on a single node (gmx 4.6.7 here and for all that > follows): > mdrun -nt 24 ... > > The run also works fine on two nodes: > ibrun -np 48 mdrun_mpi ... > > However, if I try to do temperature replica exchange (REMD), with two > replicas and two nodes like this: > ibrun -np 48 mdrun_mpi -multi 2 -replex 200 ... > > then I get the error message: > Fatal error: > Time or ensemble averaged or multiple pair distance restraints do not work > (yet) with domain decomposition, use particle decomposition (mdrun option > -pd) > Right. Unfortunately, the ensemble-restraints code dates from some of the very early days of GROMACS. It uses mdrun -multi and (IIRC) is hard-coded to be on if its topology and runtime conditions are satisfied. That is, you can't run non-ensemble distance restraints with normal mdrun -multi. So when REMD also uses mdrun -multi, things get confused. Aside: I tried particle decomposition, but when I do that without the REMD, > simply running the 48-core job that worked fine with domain decomposition, > I get LINCS errors and quickly a crash (note that without -pd I have 25 ns > and counting of run without error): > Step 0, time 0 (ps) LINCS WARNING in simulation 0 > relative constraint deviation after LINCS: > rms 5.774043, max 48.082966 (between atoms 21554 and 21555) > bonds that rotated more than 30 degrees: > atom 1 atom 2 angle previous, current, constraint length > ... > > So I am stuck with an error message that is not entirely helpful because > (a) the -pd option does not solve the issue even without REMD and also (b) > the issue seems to be related to REMD (because without REMD I can run on > multiple nodes) though that is not mentioned in the error message. > Yes, the ensemble restraints code is issuing that message. It doesn't know that REMD is a thing. > I note that Mark Abraham mentioned here: > http://redmine.gromacs.org/issues/1117 that: > "You can use MPI, you just can't have more than one domain (= MPI rank) > per simulation. For a multi-simulation with distance restraints and not > replica-exchange, you thus must have as many MPI ranks as simulations, so > that each simulation has one rank and thus one domain." > > I have trouble interpreting this, as I have always thought that running > MPI across multiple nodes requires multiple domains (apparently = MPI > ranks), so I am confused as to why that is possible without REMD but gets > messy with REMD. > I'm not sure why I mentioned REMD, but the topic there is ensemble restraints. > Final note: I am not trying to do "Time or ensemble averaged" distance > restraints, and I
[gmx-users] REMD and distance restraints problem in gmx 4.6.7
Dear Users: I have a system with many distance restraints, designed to maintain helical character, e.g.: [ distance_restraints ] 90 33 1 1 2 2.745541e-01 3.122595e-01 999 1.0 97 57 1 2 2 2.876300e-01 2.892921e-01 999 1.0 114 73 1 3 2 2.704403e-01 2.929642e-01 999 1.0 ... Distance restraints are properly turned on in the .mdp file with: disre=simple disre-fc=1000 The run works fine on a single node (gmx 4.6.7 here and for all that follows): mdrun -nt 24 ... The run also works fine on two nodes: ibrun -np 48 mdrun_mpi ... However, if I try to do temperature replica exchange (REMD), with two replicas and two nodes like this: ibrun -np 48 mdrun_mpi -multi 2 -replex 200 ... then I get the error message: Fatal error: Time or ensemble averaged or multiple pair distance restraints do not work (yet) with domain decomposition, use particle decomposition (mdrun option -pd) Aside: I tried particle decomposition, but when I do that without the REMD, simply running the 48-core job that worked fine with domain decomposition, I get LINCS errors and quickly a crash (note that without -pd I have 25 ns and counting of run without error): Step 0, time 0 (ps) LINCS WARNING in simulation 0 relative constraint deviation after LINCS: rms 5.774043, max 48.082966 (between atoms 21554 and 21555) bonds that rotated more than 30 degrees: atom 1 atom 2 angle previous, current, constraint length ... So I am stuck with an error message that is not entirely helpful because (a) the -pd option does not solve the issue even without REMD and also (b) the issue seems to be related to REMD (because without REMD I can run on multiple nodes) though that is not mentioned in the error message. I note that Mark Abraham mentioned here: http://redmine.gromacs.org/issues/1117 that: "You can use MPI, you just can't have more than one domain (= MPI rank) per simulation. For a multi-simulation with distance restraints and not replica-exchange, you thus must have as many MPI ranks as simulations, so that each simulation has one rank and thus one domain." I have trouble interpreting this, as I have always thought that running MPI across multiple nodes requires multiple domains (apparently = MPI ranks), so I am confused as to why that is possible without REMD but gets messy with REMD. Final note: I am not trying to do "Time or ensemble averaged" distance restraints, and I think that I am not trying to do "multiple pair distance restraints", unless that simply means having more than one simple distance restraint. So at the very least I think that the error message that I get is confusing. If the solution or source of error is obvious then sorry.. maybe I just don't get MPI well enough. Thank you for your suggestions, Chris. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.