Re: [gmx-users] replica exchange simulations performance issues.
On Tue, Mar 31, 2020 at 1:45 AM Miro Astore wrote: > I got up to 25-26 ns/day with my 4 replica system (same logic scaled > up to 73 replicas) which I think is reasonable. Could I do better? > Hard to say without complete log file. Please share single run and multi run log files. > > mpirun -np 48 gmx_mpi mdrun -ntomp 1 -v -deffnm memb_prod1 -multidir > 1 2 3 4 -replex 1000 > > I have tried following the manual but I don't think i'm going it > right I keep getting errors. If you have a minute to suggest how I > could do this I would appreciate that. > Again, the exact error messages and associated command line/log are necessary to be able to give further suggestions. -- Szilárd > > log file accounting: > R E A L C Y C L E A N D T I M E A C C O U N T I N G > On 12 MPI ranks Computing: Num Num Call Wall time Giga-Cycles Ranks > Threads Count (s) total sum % > > - > Domain decomp. 12 1 26702 251.490 8731.137 1.5 > DD comm. load 12 1 25740 1.210 42.003 0.0 DD > comm. bounds 12 1 26396 9.627 334.238 0.1 > Neighbor search 12 1 25862 283.564 9844.652 1.7 > Launch GPU ops. 12 1 5004002 343.309 11918.867 2.0 > Comm. coord. 12 1 2476139 508.526 17654.811 3.0 Force 12 1 2502001 > 419.341 14558.495 2.5 > Wait + Comm. F 12 1 2502001 347.752 12073.100 2.1 > PME mesh 12 1 2502001 11721.893 406955.915 69.2 > Wait Bonded GPU 12 1 2503 0.008 0.285 0.0 > Wait GPU NB nonloc. 12 1 2502001 48.918 1698.317 0.3 > Wait GPU NB local 12 1 2502001 19.475 676.141 0.1 > NB X/F buffer ops. 12 1 9956280 753.489 26159.337 4.5 > Write traj. 12 1 519 1.078 37.427 0.0 Update 12 1 2502001 434.272 > 15076.886 2.6 > Constraints 12 1 2502001 701.800 24364.800 4.1 > Comm. energies 12 1 125942 36.574 1269.776 0.2 > Rest 1047.855 36378.988 6.2 > > - > Total 16930.182 587775.176 100.0 > > - > Breakdown of PME mesh computation > > - > PME redist. X/F 12 1 5004002 1650.247 57292.604 9.7 > PME spread 12 1 2502001 4133.126 143492.183 24.4 > PME gather 12 1 2502001 2303.327 79965.968 13.6 > PME 3D-FFT 12 1 5004002 2119.410 73580.828 12.5 > PME 3D-FFT Comm. 12 1 5004002 918.318 31881.804 5.4 > PME solve Elec 12 1 2502001 584.446 20290.548 3.5 > > - > > Best, Miro > > On Tue, Mar 31, 2020 at 9:58 AM Szilárd Páll > wrote: > > > > On Sun, Mar 29, 2020 at 3:56 AM Miro Astore > wrote: > > > > > Hi everybody. I've been experimenting with REMD for my system running > > > on 48 cores with 4 gpus (I will need to scale up to 73 replicas > > > because this is a complicated system with many DOF I'm open to being > > > told this is all a silly idea). > > > > > > > It is a bad idea, you should have at least 1 physical core per replica > and > > with a large system ideally more. > > However, if you are going for high efficiency (aggregate ns/day per > phyical > > node), always put at least 2 replicas per GPU. > > > > > > > > > > My run configuration is > > > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11 > > > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 > > > > > > the best I can squeeze out of this is 9ns/day. In a non-replica > > > simulation I can hit 50ns/day with a single GPU and 12 cores. > > > > > > > That is abnormal and indicates that: > > - either something is wrong with the hardware mapping / assignment in > your > > run or; do use simply "-pin on" and let mdrun manage threads pinning > (that > > map-by-numa is certainly not optimal); also I advise against tweaking the > > thread count and using weird numbers like 11 (just use quarter); > > - your exchange overhead is very high (check the communication cost in > the > > log) > > > > If you share some log files of a standalone and a replex run, we can > advise > > where the performance loss comes from. > > > > Cheers, > > -- > > Szilárd > > > > Looking at my accounting, for a single replica 52% of time is being > > > spent on the "Force" category with 92% of my Mflops going into NxN > > > Ewald Elec. + LJ [F] > > > > > > > > I'm wondering what I could do to reduce this bottle neck if anything. > > > > > > Thank you. > > > -- > > > Miro A. Astore (he/him) > > > PhD Candidate | Computational Biophysics > > > Office 434 A28 School of Physics > > > University of Sydney > > > -- > > > Gromacs Users mailing list > > > > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > > posting! > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > * For (un)subscribe requests visit > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > > send a mail to gmx-users-requ...@gromacs.org. > > >
Re: [gmx-users] replica exchange simulations performance issues.
I got up to 25-26 ns/day with my 4 replica system (same logic scaled up to 73 replicas) which I think is reasonable. Could I do better? mpirun -np 48 gmx_mpi mdrun -ntomp 1 -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 I have tried following the manual but I don't think i'm going it right I keep getting errors. If you have a minute to suggest how I could do this I would appreciate that. log file accounting: R E A L C Y C L E A N D T I M E A C C O U N T I N G On 12 MPI ranks Computing: Num Num Call Wall time Giga-Cycles Ranks Threads Count (s) total sum % - Domain decomp. 12 1 26702 251.490 8731.137 1.5 DD comm. load 12 1 25740 1.210 42.003 0.0 DD comm. bounds 12 1 26396 9.627 334.238 0.1 Neighbor search 12 1 25862 283.564 9844.652 1.7 Launch GPU ops. 12 1 5004002 343.309 11918.867 2.0 Comm. coord. 12 1 2476139 508.526 17654.811 3.0 Force 12 1 2502001 419.341 14558.495 2.5 Wait + Comm. F 12 1 2502001 347.752 12073.100 2.1 PME mesh 12 1 2502001 11721.893 406955.915 69.2 Wait Bonded GPU 12 1 2503 0.008 0.285 0.0 Wait GPU NB nonloc. 12 1 2502001 48.918 1698.317 0.3 Wait GPU NB local 12 1 2502001 19.475 676.141 0.1 NB X/F buffer ops. 12 1 9956280 753.489 26159.337 4.5 Write traj. 12 1 519 1.078 37.427 0.0 Update 12 1 2502001 434.272 15076.886 2.6 Constraints 12 1 2502001 701.800 24364.800 4.1 Comm. energies 12 1 125942 36.574 1269.776 0.2 Rest 1047.855 36378.988 6.2 - Total 16930.182 587775.176 100.0 - Breakdown of PME mesh computation - PME redist. X/F 12 1 5004002 1650.247 57292.604 9.7 PME spread 12 1 2502001 4133.126 143492.183 24.4 PME gather 12 1 2502001 2303.327 79965.968 13.6 PME 3D-FFT 12 1 5004002 2119.410 73580.828 12.5 PME 3D-FFT Comm. 12 1 5004002 918.318 31881.804 5.4 PME solve Elec 12 1 2502001 584.446 20290.548 3.5 - Best, Miro On Tue, Mar 31, 2020 at 9:58 AM Szilárd Páll wrote: > > On Sun, Mar 29, 2020 at 3:56 AM Miro Astore wrote: > > > Hi everybody. I've been experimenting with REMD for my system running > > on 48 cores with 4 gpus (I will need to scale up to 73 replicas > > because this is a complicated system with many DOF I'm open to being > > told this is all a silly idea). > > > > It is a bad idea, you should have at least 1 physical core per replica and > with a large system ideally more. > However, if you are going for high efficiency (aggregate ns/day per phyical > node), always put at least 2 replicas per GPU. > > > > > > My run configuration is > > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11 > > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 > > > > the best I can squeeze out of this is 9ns/day. In a non-replica > > simulation I can hit 50ns/day with a single GPU and 12 cores. > > > > That is abnormal and indicates that: > - either something is wrong with the hardware mapping / assignment in your > run or; do use simply "-pin on" and let mdrun manage threads pinning (that > map-by-numa is certainly not optimal); also I advise against tweaking the > thread count and using weird numbers like 11 (just use quarter); > - your exchange overhead is very high (check the communication cost in the > log) > > If you share some log files of a standalone and a replex run, we can advise > where the performance loss comes from. > > Cheers, > -- > Szilárd > > Looking at my accounting, for a single replica 52% of time is being > > spent on the "Force" category with 92% of my Mflops going into NxN > > Ewald Elec. + LJ [F] > > > > > I'm wondering what I could do to reduce this bottle neck if anything. > > > > Thank you. > > -- > > Miro A. Astore (he/him) > > PhD Candidate | Computational Biophysics > > Office 434 A28 School of Physics > > University of Sydney > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Miro A. Astore (he/him) PhD Candidate | Computational Biophysics Office 434 A28 School of Physics University of Sydney -- Gromacs Users mailing
Re: [gmx-users] replica exchange simulations performance issues.
On Sun, Mar 29, 2020 at 3:56 AM Miro Astore wrote: > Hi everybody. I've been experimenting with REMD for my system running > on 48 cores with 4 gpus (I will need to scale up to 73 replicas > because this is a complicated system with many DOF I'm open to being > told this is all a silly idea). > It is a bad idea, you should have at least 1 physical core per replica and with a large system ideally more. However, if you are going for high efficiency (aggregate ns/day per phyical node), always put at least 2 replicas per GPU. > > My run configuration is > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11 > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 > > the best I can squeeze out of this is 9ns/day. In a non-replica > simulation I can hit 50ns/day with a single GPU and 12 cores. > That is abnormal and indicates that: - either something is wrong with the hardware mapping / assignment in your run or; do use simply "-pin on" and let mdrun manage threads pinning (that map-by-numa is certainly not optimal); also I advise against tweaking the thread count and using weird numbers like 11 (just use quarter); - your exchange overhead is very high (check the communication cost in the log) If you share some log files of a standalone and a replex run, we can advise where the performance loss comes from. Cheers, -- Szilárd Looking at my accounting, for a single replica 52% of time is being > spent on the "Force" category with 92% of my Mflops going into NxN > Ewald Elec. + LJ [F] > > I'm wondering what I could do to reduce this bottle neck if anything. > > Thank you. > -- > Miro A. Astore (he/him) > PhD Candidate | Computational Biophysics > Office 434 A28 School of Physics > University of Sydney > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] replica exchange simulations performance issues.
After much experimentation I managed to run mpirun -np 48 gmx_mpi mdrun -ntomp 1 -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 on a single node at 27 ns/day. This scaled up for 73 replicas on my 190 000 atom system ( using the same logic -np num_sims*12) on our gadi cluster in australia. I will soon see if i can get away with fewer replicas. Thanks for your help. Miro On Sun, Mar 29, 2020 at 9:04 PM Benson Muite wrote: > > > On Sun, Mar 29, 2020, at 4:55 AM, Miro Astore wrote: > > Hi everybody. I've been experimenting with REMD for my system running > > on 48 cores with 4 gpus (I will need to scale up to 73 replicas > > because this is a complicated system with many DOF I'm open to being > > told this is all a silly idea). > > > > My run configuration is > > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11 > > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 > > > > the best I can squeeze out of this is 9ns/day. In a non-replica > > simulation I can hit 50ns/day with a single GPU and 12 cores. > > What happens for a small number of replicas? > > > > > Looking at my accounting, for a single replica 52% of time is being > > spent on the "Force" category with 92% of my Mflops going into NxN > > Ewald Elec. + LJ [F] > > > > I'm wondering what I could do to reduce this bottle neck if anything. > > Do you have access to more hardware? There area number of HPC centers in > Australia. > > > > > Thank you. > > -- > > Miro A. Astore (he/him) > > PhD Candidate | Computational Biophysics > > Office 434 A28 School of Physics > > University of Sydney > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Miro A. Astore (he/him) PhD Candidate | Computational Biophysics Office 434 A28 School of Physics University of Sydney -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] replica exchange simulations performance issues.
On Sun, Mar 29, 2020, at 4:55 AM, Miro Astore wrote: > Hi everybody. I've been experimenting with REMD for my system running > on 48 cores with 4 gpus (I will need to scale up to 73 replicas > because this is a complicated system with many DOF I'm open to being > told this is all a silly idea). > > My run configuration is > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11 > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 > > the best I can squeeze out of this is 9ns/day. In a non-replica > simulation I can hit 50ns/day with a single GPU and 12 cores. What happens for a small number of replicas? > > Looking at my accounting, for a single replica 52% of time is being > spent on the "Force" category with 92% of my Mflops going into NxN > Ewald Elec. + LJ [F] > > I'm wondering what I could do to reduce this bottle neck if anything. Do you have access to more hardware? There area number of HPC centers in Australia. > > Thank you. > -- > Miro A. Astore (he/him) > PhD Candidate | Computational Biophysics > Office 434 A28 School of Physics > University of Sydney > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] replica exchange simulations performance issues.
correction: 99.3% is going into NxN Ewald Elec. + LJ [F] On Sun, Mar 29, 2020 at 12:55 PM Miro Astore wrote: > > Hi everybody. I've been experimenting with REMD for my system running > on 48 cores with 4 gpus (I will need to scale up to 73 replicas > because this is a complicated system with many DOF I'm open to being > told this is all a silly idea). > > My run configuration is > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11 > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 > > the best I can squeeze out of this is 9ns/day. In a non-replica > simulation I can hit 50ns/day with a single GPU and 12 cores. > > Looking at my accounting, for a single replica 52% of time is being > spent on the "Force" category with 92% of my Mflops going into NxN > Ewald Elec. + LJ [F] > > I'm wondering what I could do to reduce this bottle neck if anything. > > Thank you. > -- > Miro A. Astore (he/him) > PhD Candidate | Computational Biophysics > Office 434 A28 School of Physics > University of Sydney -- Miro A. Astore (he/him) PhD Candidate | Computational Biophysics Office 434 A28 School of Physics University of Sydney -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] replica exchange simulations performance issues.
Hi everybody. I've been experimenting with REMD for my system running on 48 cores with 4 gpus (I will need to scale up to 73 replicas because this is a complicated system with many DOF I'm open to being told this is all a silly idea). My run configuration is mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11 -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 the best I can squeeze out of this is 9ns/day. In a non-replica simulation I can hit 50ns/day with a single GPU and 12 cores. Looking at my accounting, for a single replica 52% of time is being spent on the "Force" category with 92% of my Mflops going into NxN Ewald Elec. + LJ [F] I'm wondering what I could do to reduce this bottle neck if anything. Thank you. -- Miro A. Astore (he/him) PhD Candidate | Computational Biophysics Office 434 A28 School of Physics University of Sydney -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.