subject:"\[gmx\-users\] replica exchange simulations performance issues."

Re: [gmx-users] replica exchange simulations performance issues.

2020-03-31 Thread Szilárd Páll

On Tue, Mar 31, 2020 at 1:45 AM Miro Astore  wrote:

> I got up to 25-26 ns/day with my 4 replica system  (same logic scaled
> up to 73 replicas) which I think is reasonable. Could I do better?
>

Hard to say without complete log file. Please share single run and multi
run log files.


>
> mpirun -np 48 gmx_mpi mdrun  -ntomp 1 -v -deffnm memb_prod1 -multidir
> 1 2 3 4 -replex 1000
>
>  I have tried following the manual but I don't think i'm going it
> right I keep getting errors. If you have a minute to suggest how I
> could do this I would appreciate that.
>

Again, the exact error messages and associated command line/log are
necessary to be able to give further suggestions.

--
Szilárd


>
> log file accounting:
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> On 12 MPI ranks Computing: Num Num Call Wall time Giga-Cycles Ranks
> Threads Count (s) total sum %
>
> -
> Domain decomp. 12 1 26702 251.490 8731.137 1.5
> DD comm. load 12 1 25740 1.210 42.003 0.0 DD
> comm. bounds 12 1 26396 9.627 334.238 0.1
> Neighbor search 12 1 25862 283.564 9844.652 1.7
> Launch GPU ops. 12 1 5004002 343.309 11918.867 2.0
> Comm. coord. 12 1 2476139 508.526 17654.811 3.0 Force 12 1 2502001
> 419.341 14558.495 2.5
> Wait + Comm. F 12 1 2502001 347.752 12073.100 2.1
> PME mesh 12 1 2502001 11721.893 406955.915 69.2
> Wait Bonded GPU 12 1 2503 0.008 0.285 0.0
> Wait GPU NB nonloc. 12 1 2502001 48.918 1698.317 0.3
> Wait GPU NB local 12 1 2502001 19.475 676.141 0.1
> NB X/F buffer ops. 12 1 9956280 753.489 26159.337 4.5
> Write traj. 12 1 519 1.078 37.427 0.0 Update 12 1 2502001 434.272
> 15076.886 2.6
> Constraints 12 1 2502001 701.800 24364.800 4.1
> Comm. energies 12 1 125942 36.574 1269.776 0.2
> Rest 1047.855 36378.988 6.2
>
> -
> Total 16930.182 587775.176 100.0
>
> -
> Breakdown of PME mesh computation
>
> -
> PME redist. X/F 12 1 5004002 1650.247 57292.604 9.7
> PME spread 12 1 2502001 4133.126 143492.183 24.4
> PME gather 12 1 2502001 2303.327 79965.968 13.6
> PME 3D-FFT 12 1 5004002 2119.410 73580.828 12.5
> PME 3D-FFT Comm. 12 1 5004002 918.318 31881.804 5.4
> PME solve Elec 12 1 2502001 584.446 20290.548 3.5
>
>  -
>
> Best, Miro
>
> On Tue, Mar 31, 2020 at 9:58 AM Szilárd Páll 
> wrote:
> >
> > On Sun, Mar 29, 2020 at 3:56 AM Miro Astore 
> wrote:
> >
> > > Hi everybody. I've been experimenting with REMD for my system running
> > > on 48 cores with 4 gpus (I will need to scale up to 73 replicas
> > > because this is a complicated system with many DOF I'm open to being
> > > told this is all a silly idea).
> > >
> >
> > It is a bad idea, you should have at least 1 physical core per replica
> and
> > with a large system ideally more.
> > However, if you are going for high efficiency (aggregate ns/day per
> phyical
> > node), always put at least 2 replicas per GPU.
> >
> >
> > >
> > > My run configuration is
> > > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
> > > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000
> > >
> > > the best I can squeeze out of this is 9ns/day. In a non-replica
> > > simulation I can hit 50ns/day with a single GPU and 12 cores.
> > >
> >
> > That is abnormal and indicates that:
> > - either something is wrong with the hardware mapping / assignment in
> your
> > run or; do use simply "-pin on" and let mdrun manage threads pinning
> (that
> > map-by-numa is certainly not optimal); also I advise against tweaking the
> > thread count and using weird numbers like 11 (just use quarter);
> > - your exchange overhead is very high (check the communication cost in
> the
> > log)
> >
> > If you share some log files of a standalone and a replex run, we can
> advise
> > where the performance loss comes from.
> >
> > Cheers,
> > --
> > Szilárd
> >
> > Looking at my accounting, for a single replica 52% of time is being
> > > spent on the "Force" category with 92% of my Mflops going into NxN
> > > Ewald Elec. + LJ [F]
> > >
> >
> > > I'm wondering what I could do to reduce this bottle neck if anything.
> > >
> > > Thank you.
> > > --
> > > Miro A. Astore   (he/him)
> > > PhD Candidate | Computational Biophysics
> > > Office 434 A28 School of Physics
> > > University of Sydney
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-requ...@gromacs.org.
> > >

Re: [gmx-users] replica exchange simulations performance issues.

2020-03-30 Thread Miro Astore

I got up to 25-26 ns/day with my 4 replica system  (same logic scaled
up to 73 replicas) which I think is reasonable. Could I do better?

mpirun -np 48 gmx_mpi mdrun  -ntomp 1 -v -deffnm memb_prod1 -multidir
1 2 3 4 -replex 1000

 I have tried following the manual but I don't think i'm going it
right I keep getting errors. If you have a minute to suggest how I
could do this I would appreciate that.

log file accounting:
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 12 MPI ranks Computing: Num Num Call Wall time Giga-Cycles Ranks
Threads Count (s) total sum %
-
Domain decomp. 12 1 26702 251.490 8731.137 1.5
DD comm. load 12 1 25740 1.210 42.003 0.0 DD
comm. bounds 12 1 26396 9.627 334.238 0.1
Neighbor search 12 1 25862 283.564 9844.652 1.7
Launch GPU ops. 12 1 5004002 343.309 11918.867 2.0
Comm. coord. 12 1 2476139 508.526 17654.811 3.0 Force 12 1 2502001
419.341 14558.495 2.5
Wait + Comm. F 12 1 2502001 347.752 12073.100 2.1
PME mesh 12 1 2502001 11721.893 406955.915 69.2
Wait Bonded GPU 12 1 2503 0.008 0.285 0.0
Wait GPU NB nonloc. 12 1 2502001 48.918 1698.317 0.3
Wait GPU NB local 12 1 2502001 19.475 676.141 0.1
NB X/F buffer ops. 12 1 9956280 753.489 26159.337 4.5
Write traj. 12 1 519 1.078 37.427 0.0 Update 12 1 2502001 434.272 15076.886 2.6
Constraints 12 1 2502001 701.800 24364.800 4.1
Comm. energies 12 1 125942 36.574 1269.776 0.2
Rest 1047.855 36378.988 6.2
-
Total 16930.182 587775.176 100.0
-
Breakdown of PME mesh computation
-
PME redist. X/F 12 1 5004002 1650.247 57292.604 9.7
PME spread 12 1 2502001 4133.126 143492.183 24.4
PME gather 12 1 2502001 2303.327 79965.968 13.6
PME 3D-FFT 12 1 5004002 2119.410 73580.828 12.5
PME 3D-FFT Comm. 12 1 5004002 918.318 31881.804 5.4
PME solve Elec 12 1 2502001 584.446 20290.548 3.5
 -

Best, Miro

On Tue, Mar 31, 2020 at 9:58 AM Szilárd Páll  wrote:
>
> On Sun, Mar 29, 2020 at 3:56 AM Miro Astore  wrote:
>
> > Hi everybody. I've been experimenting with REMD for my system running
> > on 48 cores with 4 gpus (I will need to scale up to 73 replicas
> > because this is a complicated system with many DOF I'm open to being
> > told this is all a silly idea).
> >
>
> It is a bad idea, you should have at least 1 physical core per replica and
> with a large system ideally more.
> However, if you are going for high efficiency (aggregate ns/day per phyical
> node), always put at least 2 replicas per GPU.
>
>
> >
> > My run configuration is
> > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
> > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000
> >
> > the best I can squeeze out of this is 9ns/day. In a non-replica
> > simulation I can hit 50ns/day with a single GPU and 12 cores.
> >
>
> That is abnormal and indicates that:
> - either something is wrong with the hardware mapping / assignment in your
> run or; do use simply "-pin on" and let mdrun manage threads pinning (that
> map-by-numa is certainly not optimal); also I advise against tweaking the
> thread count and using weird numbers like 11 (just use quarter);
> - your exchange overhead is very high (check the communication cost in the
> log)
>
> If you share some log files of a standalone and a replex run, we can advise
> where the performance loss comes from.
>
> Cheers,
> --
> Szilárd
>
> Looking at my accounting, for a single replica 52% of time is being
> > spent on the "Force" category with 92% of my Mflops going into NxN
> > Ewald Elec. + LJ [F]
> >
>
> > I'm wondering what I could do to reduce this bottle neck if anything.
> >
> > Thank you.
> > --
> > Miro A. Astore   (he/him)
> > PhD Candidate | Computational Biophysics
> > Office 434 A28 School of Physics
> > University of Sydney
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.



-- 
Miro A. Astore   (he/him)
PhD Candidate | Computational Biophysics
Office 434 A28 School of Physics
University of Sydney
-- 
Gromacs Users mailing

Re: [gmx-users] replica exchange simulations performance issues.

2020-03-30 Thread Szilárd Páll

On Sun, Mar 29, 2020 at 3:56 AM Miro Astore  wrote:

> Hi everybody. I've been experimenting with REMD for my system running
> on 48 cores with 4 gpus (I will need to scale up to 73 replicas
> because this is a complicated system with many DOF I'm open to being
> told this is all a silly idea).
>

It is a bad idea, you should have at least 1 physical core per replica and
with a large system ideally more.
However, if you are going for high efficiency (aggregate ns/day per phyical
node), always put at least 2 replicas per GPU.


>
> My run configuration is
> mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
> -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000
>
> the best I can squeeze out of this is 9ns/day. In a non-replica
> simulation I can hit 50ns/day with a single GPU and 12 cores.
>

That is abnormal and indicates that:
- either something is wrong with the hardware mapping / assignment in your
run or; do use simply "-pin on" and let mdrun manage threads pinning (that
map-by-numa is certainly not optimal); also I advise against tweaking the
thread count and using weird numbers like 11 (just use quarter);
- your exchange overhead is very high (check the communication cost in the
log)

If you share some log files of a standalone and a replex run, we can advise
where the performance loss comes from.

Cheers,
--
Szilárd

Looking at my accounting, for a single replica 52% of time is being
> spent on the "Force" category with 92% of my Mflops going into NxN
> Ewald Elec. + LJ [F]
>

> I'm wondering what I could do to reduce this bottle neck if anything.
>
> Thank you.
> --
> Miro A. Astore   (he/him)
> PhD Candidate | Computational Biophysics
> Office 434 A28 School of Physics
> University of Sydney
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] replica exchange simulations performance issues.

2020-03-29 Thread Miro Astore

After much experimentation I managed to run
mpirun -np 48 gmx_mpi mdrun  -ntomp 1 -v -deffnm memb_prod1 -multidir
1 2 3 4 -replex 1000

on a single node

at 27 ns/day. This scaled up for 73 replicas on my 190 000 atom system
 ( using the same logic -np num_sims*12) on our gadi cluster in
australia. I will soon see if i can get away with fewer replicas.

Thanks for your help. Miro

On Sun, Mar 29, 2020 at 9:04 PM Benson Muite  wrote:
>
>
> On Sun, Mar 29, 2020, at 4:55 AM, Miro Astore wrote:
> > Hi everybody. I've been experimenting with REMD for my system running
> > on 48 cores with 4 gpus (I will need to scale up to 73 replicas
> > because this is a complicated system with many DOF I'm open to being
> > told this is all a silly idea).
> >
> > My run configuration is
> > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
> > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000
> >
> > the best I can squeeze out of this is 9ns/day. In a non-replica
> > simulation I can hit 50ns/day with a single GPU and 12 cores.
>
> What happens for a small number of replicas?
>
> >
> > Looking at my accounting, for a single replica 52% of time is being
> > spent on the "Force" category with 92% of my Mflops going into NxN
> > Ewald Elec. + LJ [F]
> >
> > I'm wondering what I could do to reduce this bottle neck if anything.
>
> Do you have access to more hardware? There area number of HPC centers in 
> Australia.
>
> >
> > Thank you.
> > --
> > Miro A. Astore   (he/him)
> > PhD Candidate | Computational Biophysics
> > Office 434 A28 School of Physics
> > University of Sydney
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.



-- 
Miro A. Astore   (he/him)
PhD Candidate | Computational Biophysics
Office 434 A28 School of Physics
University of Sydney
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] replica exchange simulations performance issues.

2020-03-29 Thread Benson Muite



On Sun, Mar 29, 2020, at 4:55 AM, Miro Astore wrote:
> Hi everybody. I've been experimenting with REMD for my system running
> on 48 cores with 4 gpus (I will need to scale up to 73 replicas
> because this is a complicated system with many DOF I'm open to being
> told this is all a silly idea).
> 
> My run configuration is
> mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
> -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000
> 
> the best I can squeeze out of this is 9ns/day. In a non-replica
> simulation I can hit 50ns/day with a single GPU and 12 cores.

What happens for a small number of replicas?

> 
> Looking at my accounting, for a single replica 52% of time is being
> spent on the "Force" category with 92% of my Mflops going into NxN
> Ewald Elec. + LJ [F]
> 
> I'm wondering what I could do to reduce this bottle neck if anything.

Do you have access to more hardware? There area number of HPC centers in 
Australia.

> 
> Thank you.
> -- 
> Miro A. Astore   (he/him)
> PhD Candidate | Computational Biophysics
> Office 434 A28 School of Physics
> University of Sydney
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before 
> posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or 
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] replica exchange simulations performance issues.

2020-03-28 Thread Miro Astore

correction: 99.3% is going into NxN
Ewald Elec. + LJ [F]

On Sun, Mar 29, 2020 at 12:55 PM Miro Astore  wrote:
>
> Hi everybody. I've been experimenting with REMD for my system running
> on 48 cores with 4 gpus (I will need to scale up to 73 replicas
> because this is a complicated system with many DOF I'm open to being
> told this is all a silly idea).
>
> My run configuration is
> mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
> -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000
>
> the best I can squeeze out of this is 9ns/day. In a non-replica
> simulation I can hit 50ns/day with a single GPU and 12 cores.
>
> Looking at my accounting, for a single replica 52% of time is being
> spent on the "Force" category with 92% of my Mflops going into NxN
> Ewald Elec. + LJ [F]
>
> I'm wondering what I could do to reduce this bottle neck if anything.
>
> Thank you.
> --
> Miro A. Astore   (he/him)
> PhD Candidate | Computational Biophysics
> Office 434 A28 School of Physics
> University of Sydney



-- 
Miro A. Astore   (he/him)
PhD Candidate | Computational Biophysics
Office 434 A28 School of Physics
University of Sydney
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] replica exchange simulations performance issues.

2020-03-28 Thread Miro Astore

Hi everybody. I've been experimenting with REMD for my system running
on 48 cores with 4 gpus (I will need to scale up to 73 replicas
because this is a complicated system with many DOF I'm open to being
told this is all a silly idea).

My run configuration is
mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
-v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000

the best I can squeeze out of this is 9ns/day. In a non-replica
simulation I can hit 50ns/day with a single GPU and 12 cores.

Looking at my accounting, for a single replica 52% of time is being
spent on the "Force" category with 92% of my Mflops going into NxN
Ewald Elec. + LJ [F]

I'm wondering what I could do to reduce this bottle neck if anything.

Thank you.
-- 
Miro A. Astore   (he/him)
PhD Candidate | Computational Biophysics
Office 434 A28 School of Physics
University of Sydney
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] replica exchange simulations performance issues.

Re: [gmx-users] replica exchange simulations performance issues.

Re: [gmx-users] replica exchange simulations performance issues.

Re: [gmx-users] replica exchange simulations performance issues.

Re: [gmx-users] replica exchange simulations performance issues.

Re: [gmx-users] replica exchange simulations performance issues.

[gmx-users] replica exchange simulations performance issues.

7 matches

Site Navigation

Mail list logo

Footer information