Re: [gmx-users] Too much PME mesh wall time.

2014-08-25 Thread Mark Abraham
On Sun, Aug 24, 2014 at 2:19 AM, Yunlong Liu yliu...@jh.edu wrote:

 Hi gromacs users,

 I met a problem with too much PME Mesh time in my simulation. The
 following is my time accounting. I am running my simulation on 2 nodes.
 Each of them has 16 CPUs and 1 Tesla K20m Nvidia GPU.

 And my mdrun command is ibrun /work/03002/yliu120/gromacs-5/bin/mdrun_mpi
 -pin on -ntomp 8 -dlb no -deffnm pi3k-wt-charm-4 -gpu_id 00.

 I manually turned off dlb since when it is turned on, the simulation will
 crash. I have reported it to both mailing lists and talked to Roland.


Hmm. This shouldn't happen. Can you please open an issue at
http://redmine.gromacs.org/ and upload enough info for us to replicate it?


  R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 On 4 MPI ranks, each using 8 OpenMP threads

  Computing:  Num   Num  CallWall time Giga-Cycles
  Ranks Threads  Count  (s) total sum%
 
 -
  Domain decomp. 48 151592.099 137554.334   2.2
  DD comm. load  48751   0.057 4.947   0.0
  Neighbor search48 150001 665.072 57460.919   0.9
  Launch GPU ops.48   1502 967.023 83548.916   1.3
  Comm. coord.   487352488.263 214981.185   3.5
  Force  487517037.401 608018.042   9.8
  Wait + Comm. F 487513931.222 339650.132   5.5
 * PME mesh   48 751   40799.9373525036.971  56.7*
  Wait GPU nonlocal  487511985.151 171513.300   2.8
  Wait GPU local 48751  68.365 5906.612   0.1
  NB X/F buffer ops. 48   29721229.406 106218.328   1.7
  Write traj.48830  28.245 2440.304   0.0
  Update 487512479.611 214233.669   3.4
  Constraints487517041.030 608331.635   9.8
  Comm. energies 48 150001  14.250 1231.154   0.0
  Rest1601.588 138374.139   2.2
 
 -
  Total  71928.719 6214504.588 100.0
 
 -
  Breakdown of PME mesh computation
 
 -
  PME redist. X/F48   15028362.454 722500.151  11.6
  PME spread/gather  48   1502   14836.350 1281832.463  20.6
  PME 3D-FFT 48   15028985.776 776353.949  12.5
  PME 3D-FFT Comm.   48   15027547.935 652127.220  10.5
  PME solve Elec 487511025.249 88579.550   1.4
 
 -

 First, I would like to know whether this is a big problem and second, I
 want to know how to improve my performance?


Too much mesh time is not really possible. With the GPU doing the
short-ranged work, the only work for the CPU to do is the bondeds (in
Force above) and long-range (PME mesh). Those ought to dominate the run
time, and roughly in that ratio for a typical biomolecular system.

Does it mean that my GPU is running too fast and CPU is waiting.


Looks balanced - if the GPU had too much work then the Wait GPU times would
be appreciable. What did the PP-PME load balancing at the start of the run
look like?


 BTW, what does the wait GPU nonlocal refer to?


When using DD and GPUs, the short-ranged work on each PP rank is decomposed
into a set whose resulting forces are needed by other (non-local) PP
ranks, and the rest. Then the non-local work is done first, so that once
PME mesh work is done, the PP-PP MPI communication could be overlapped
with the local short-ranged GPU work. The 0.1% time for Wait GPU local
indicates that the communication took longer than the amount of local work,
perhaps because there was not much of the latter or it was already
complete. Unfortunately, it is not always possible to get timing
information from CUDA without slowing down the run. What actually happens
is strongly dependent on the hardware and simulation system.

Mark

Thank you.
 Yunlong

 --

 
 Yunlong Liu, PhD Candidate
 Computational Biology and Biophysics
 Department of Biophysics and Biophysical Chemistry
 School of Medicine, The Johns Hopkins University
 Email: yliu...@jhmi.edu
 Address: 725 N Wolfe St, WBSB RM 601, 21205
 

 --
 Gromacs Users mailing list

 * Please search the archive at http://www.gromacs.org/
 Support/Mailing_Lists/GMX-Users_List before posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 

[gmx-users] Too much PME mesh wall time.

2014-08-23 Thread Yunlong Liu

Hi gromacs users,

I met a problem with too much PME Mesh time in my simulation. The 
following is my time accounting. I am running my simulation on 2 nodes. 
Each of them has 16 CPUs and 1 Tesla K20m Nvidia GPU.


And my mdrun command is ibrun 
/work/03002/yliu120/gromacs-5/bin/mdrun_mpi -pin on -ntomp 8 -dlb no 
-deffnm pi3k-wt-charm-4 -gpu_id 00.


I manually turned off dlb since when it is turned on, the simulation 
will crash. I have reported it to both mailing lists and talked to Roland.


 R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 8 OpenMP threads

 Computing:  Num   Num  CallWall time Giga-Cycles
 Ranks Threads  Count  (s) total sum%
-
 Domain decomp. 48 151592.099 137554.334   2.2
 DD comm. load  48751   0.057 4.947   0.0
 Neighbor search48 150001 665.072 57460.919   0.9
 Launch GPU ops.48   1502 967.023 83548.916   1.3
 Comm. coord.   487352488.263 214981.185   3.5
 Force  487517037.401 608018.042   9.8
 Wait + Comm. F 487513931.222 339650.132   5.5
* PME mesh   48 751   40799.9373525036.971  56.7*
 Wait GPU nonlocal  487511985.151 171513.300   2.8
 Wait GPU local 48751  68.365 5906.612   0.1
 NB X/F buffer ops. 48   29721229.406 106218.328   1.7
 Write traj.48830  28.245 2440.304   0.0
 Update 487512479.611 214233.669   3.4
 Constraints487517041.030 608331.635   9.8
 Comm. energies 48 150001  14.250 1231.154   0.0
 Rest1601.588 138374.139   2.2
-
 Total  71928.719 6214504.588 100.0
-
 Breakdown of PME mesh computation
-
 PME redist. X/F48   15028362.454 722500.151  11.6
 PME spread/gather  48   1502   14836.350 1281832.463  20.6
 PME 3D-FFT 48   15028985.776 776353.949  12.5
 PME 3D-FFT Comm.   48   15027547.935 652127.220  10.5
 PME solve Elec 487511025.249 88579.550   1.4
-

First, I would like to know whether this is a big problem and second, I 
want to know how to improve my performance?
Does it mean that my GPU is running too fast and CPU is waiting. BTW, 
what does the wait GPU nonlocal refer to?


Thank you.
Yunlong

--


Yunlong Liu, PhD Candidate
Computational Biology and Biophysics
Department of Biophysics and Biophysical Chemistry
School of Medicine, The Johns Hopkins University
Email: yliu...@jhmi.edu
Address: 725 N Wolfe St, WBSB RM 601, 21205


--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.