from:"Theodore Si"

Re: [gmx-users] which item is gpu wait for cpu?

2014-09-25 Thread Theodore Si


That's a breakdown of the timings that CUDA cannot always measure.

What does that mean?

We run the same test case on the same machine. This GPU timings part didn't 
show up before, but now it appears.

在 9/25/2014 6:11 PM, Mark Abraham 写道:

On Thu, Sep 25, 2014 at 11:57 AM, xiexiao...@sjtu.edu.cn 
xiexiao...@sjtu.edu.cn wrote:


Thanks.And I have another question want to ask.
I used the command mdrun_mpi -v -deffnm test to run the .tpr file and
find some new information in my log file:

  GPU timings

-
Computing: Count Wall t (s) ms/step %

-
Pair list H2D 501 0.364 0.726 0.4
X / q H2D 20001 4.153 0.208 4.5
Nonbonded F kernel 18800 77.657 4.131 84.5
Nonbonded F+ene k. 700 4.060 5.800 4.4
Nonbonded F+prune k. 400 2.121 5.303 2.3
Nonbonded F+ene+prune k. 101 0.717 7.103 0.8
F D2H 20001 2.873 0.144 3.1

-
Total 91.946 4.597 100.0

-

Force evaluation time GPU/CPU: 4.597 ms/5.245 ms = 0.876.

But I used to use the some command and didn't appear such information.Does
anyone know why?


That's a breakdown of the timings that CUDA cannot always measure.



I also find that the time consuming of Nonbonded F kernel is 77.657s but
the time consuming of force is only 55.263s.Does the force item include
non-bonded force?


No, it's non-PME work being done on the CPU (e.g. bonds, restraints,
free-energy short-ranged)

Mark

If it was true,why the Nonbonded F kernel takes more time than the force ?



xiexiao...@sjtu.edu.cn

From: Szilárd Páll
Date: 2014-09-24 17:45
To: Discussion list for GROMACS users
CC: gromacs.org_gmx-users
Subject: Re: [gmx-users] which item is gpu wait for cpu?
On Wed, Sep 24, 2014 at 4:57 AM, xiexiao...@sjtu.edu.cn
xiexiao...@sjtu.edu.cn wrote:

I think that the items (wait GPU nonlocal) and (wait GPU local) in log

file are the time consuming that cpu wait for gpu. And Does anyone know
that which item is the time consuming that gpu wait for cpu in log file?

Due to a CUDA runtime limitation, that can not measured except for the
no domain-decomposition case (where it is expressed as overlap).



xiexiao...@sjtu.edu.cn
--
Gromacs Users mailing list

* Please search the archive at

http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or

send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] How are the clusters of particles created?

2014-09-18 Thread Theodore Si


Hi all,

In the page 34 of the manual:
The Verlet cut-off scheme is implemented in a very efﬁcient fashion 
based on clusters of particles.
The simplest example is a cluster size of 4 particles. The pair list is 
then constructed based on

cluster pairs.
I want to know on what condition will 4 particles be put in a cluster? 
Can you explain the algorithm in a simple manner?


BR,
Theo
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] On what scale will simulation with PME-dedicated nodes perform better?

2014-09-18 Thread Theodore Si


Hi all,

I run gromacs 4.6 on 5 nodes(each has 16 CPU cores and 2 Nvidia K20m) 
and 4 nodes in the following ways:


5 nodes:
1. Each node has 8 MPI processes, and use one node as PME-dedicated node
2. Each node has 8 MPI processes, and use two nodes as PME-dedicated nodes
3. Each node has 4 MPI processes, and use one node as PME-dedicated node

In these settings, the log files complain that PME nodes have more work 
to do than PP nodes, and the average imbalance is 20% - 40%.


4nodes:
Each node has 8 MPI processes, and there is no PME-dedicated node
In the log file, the PME mesh wall time is about the half compared the 
settings above. My guess is that the scaling of my run is small so 
PME-dedicated nodes won't do any good.

So, on what condition should I set PME nodes manually?
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-25 Thread Theodore Si


I mapped 2 GPUs to multiple MPI ranks by using -gpu_id

On 8/26/2014 1:12 AM, Xingcheng Lin wrote:

Theodore Si sjyzhxw@... writes:


Hi,

https://onedrive.live.com/redir?

resid=990FCE59E48164A4!2572authkey=!AP82sTNxS6MHgUkithint=file%2clog

https://onedrive.live.com/redir?

resid=990FCE59E48164A4!2482authkey=!APLkizOBzXtPHxsithint=file%2clog

These are 2 log files. The first one  is using 64 cpu cores(64 / 16 = 4
nodes) and 4nodes*2 = 8 GPUs, and the second is using 512 cpu cores, no GPU.
When we look at the 64 cores log file, we find that in the  R E A L   C
Y C L E   A N D   T I M E   A C C O U N T I N G table, the total wall
time is the sum of every line, that is 37.730=2.201+0.082+...+1.150. So
we think that when the CPUs is doing PME, GPUs are doing nothing. That's
why we say they are working sequentially.
As for the 512 cores log file, the total wall time is approximately the
sum of PME mesh and PME wait for PP. We think this is because
PME-dedicated nodes finished early, and the total wall time is the time
spent on PP nodes, therefore time spent on PME is covered.


Hi,

I have a naive question:

In your log file there are only 2 GPUs being detected:

2 GPUs detected on host gpu42:
   #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
   #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible

In the end you selected 8 GPUs

8 GPUs user-selected for this run: #0, #0, #0, #0, #1, #1, #1, #1

Did you choose 8 GPUs or 2 GPUs? What is your mdrun command?

Thank you,





--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-25 Thread Theodore Si


Hi  Szilárd,

But CUDA 5.5 won't work with icc 14, right?
It only works with 12.1 unless a header of CUDA 5.5 to be modified.

Theo

On 8/25/2014 9:44 PM, Szilárd Páll wrote:

On Mon, Aug 25, 2014 at 8:08 AM, Mark Abraham mark.j.abra...@gmail.com wrote:

On Mon, Aug 25, 2014 at 5:01 AM, Theodore Si sjyz...@gmail.com wrote:


Hi,

https://onedrive.live.com/redir?resid=990FCE59E48164A4!
2572authkey=!AP82sTNxS6MHgUkithint=file%2clog
https://onedrive.live.com/redir?resid=990FCE59E48164A4!
2482authkey=!APLkizOBzXtPHxsithint=file%2clog

These are 2 log files. The first one  is using 64 cpu cores(64 / 16 = 4
nodes) and 4nodes*2 = 8 GPUs, and the second is using 512 cpu cores, no GPU.
When we look at the 64 cores log file, we find that in the  R E A L   C Y
C L E   A N D   T I M E   A C C O U N T I N G table, the total wall time is
the sum of every line, that is 37.730=2.201+0.082+...+1.150. So we think
that when the CPUs is doing PME, GPUs are doing nothing. That's why we say
they are working sequentially.


Please note that sequential means one phase after another. Your log
files don't show the timing breakdown for the GPUs, which is distinct from
showing that the GPUs ran and then the CPUs ran (which I don't think the
code even permits!). References to CUDA 8x8 kernels do show the GPU was
active. There was an issue with mdrun not always being able to gather and
publish the GPU timing results; I don't recall the conditions (Szilard
might remember), but it might be fixed in a later release.

It is a limitation (well, I'd say borderline bug) in CUDA that if you
have multiple work-queues (=streams), reliable timing using the CUDA
built-in mechanisms is impossible. There may be a way to work around
this, but that won't happen in the current versions. What's important
is to observe the wait time on the CPU sideand of course, if the Op is
profiling this is not an issue.


In any case, you
should probably be doing performance optimization on a GROMACS version that
isn't a year old.

I gather that you didn't actually observe the GPUs idle - e.g. with a
performance monitoring tool? Otherwise, and in the absence of a description
of your simulation system, I'd say that log file looks somewhere between
normal and optimal. For the record, for better performance, you should
probably be following the advice of the install guide and not compiling
FFTW with AVX support, and using one of the five gcc minor versions
released since 4.4 ;-)

And besides avoiding ancient gcc versions, I suggest using CUDA 5.5
(which you can use because you have version 5.5 driver which I see in
your log file):

Additionally, I suggest avoiding MKL and using FFTW instead. For the
grid sizes of our interest all benchmarks I did in the past showed
considerably higher FFTW performance. Same goes for icc, but feel free
to benchmark and please report back if you find the opposite.


As for the 512 cores log file, the total wall time is approximately the sum

of PME mesh and PME wait for PP. We think this is because PME-dedicated
nodes finished early, and the total wall time is the time spent on PP
nodes, therefore time spent on PME is covered.


Yes, using an offload model makes it awkward to report CPU timings, because
there are two kinds of CPU ranks. The total of the Wall t column adds up
to twice the total time taken (which is noted explicitly in more recent
mdrun versions). By design, the PME ranks do finish early, as you know from
Figure 3.16 of the manual. As you can see in the table, the PP ranks spend
26% of their time waiting for the results from the PME ranks, and this is
the origin of the note (above the table) that you might want to balance
things better.

Mark

On 8/23/2014 9:30 PM, Mark Abraham wrote:

On Sat, Aug 23, 2014 at 1:47 PM, Theodore Si sjyz...@gmail.com wrote:

  Hi,

When we used 2 GPU nodes (each has 2 cpus and 2 gpus) to do a mdrun(with
no PME-dedicated node), we noticed that when CPU are doing PME, GPU are
idle,


That could happen if the GPU completes its work too fast, in which case
the
end of the log file will probably scream about imbalance.

that is they are doing their work sequentially.


Highly unlikely, not least because the code is written to overlap the
short-range work on the GPU with everything else on the CPU. What's your
evidence for *sequential* rather than *imbalanced*?


  Is it supposed to be so?
No, but without seeing your .log files, mdrun command lines and knowing
about your hardware, there's nothing we can say.


  Is it the same reason as GPUs on PME-dedicated nodes won't be used during

a run like you said before?


Why would you suppose that? I said GPUs do work from the PP ranks on their
node. That's true here.

So if we want to exploit our hardware, we should map PP-PME ranks
manually,


right? Say, use one node as PME-dedicated node and leave the GPUs on that
node idle, and use two nodes to do the other stuff. How do you think
about
this arrangement?

  Probably a terrible idea. You should

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-24 Thread Theodore Si

Hi,

https://onedrive.live.com/redir?resid=990FCE59E48164A4!2572authkey=!AP82sTNxS6MHgUkithint=file%2clog
https://onedrive.live.com/redir?resid=990FCE59E48164A4!2482authkey=!APLkizOBzXtPHxsithint=file%2clog

These are 2 log files. The first one is using 64 cpu cores(64 / 16 = 4
nodes) and 4nodes*2 = 8 GPUs, and the second is using 512 cpu cores, no GPU.
When we look at the 64 cores log file, we find that in the R E A L C
Y C L E A N D T I M E A C C O U N T I N G table, the total wall
time is the sum of every line, that is 37.730=2.201+0.082+...+1.150. So
we think that when the CPUs is doing PME, GPUs are doing nothing. That's
why we say they are working sequentially.
As for the 512 cores log file, the total wall time is approximately the
sum of PME mesh and PME wait for PP. We think this is because
PME-dedicated nodes finished early, and the total wall time is the time
spent on PP nodes, therefore time spent on PME is covered.

On 8/23/2014 9:30 PM, Mark Abraham wrote:

On Sat, Aug 23, 2014 at 1:47 PM, Theodore Si sjyz...@gmail.com wrote:

Hi,

When we used 2 GPU nodes (each has 2 cpus and 2 gpus) to do a mdrun(with
no PME-dedicated node), we noticed that when CPU are doing PME, GPU are
idle,

That could happen if the GPU completes its work too fast, in which case the
end of the log file will probably scream about imbalance.

that is they are doing their work sequentially.

Highly unlikely, not least because the code is written to overlap the
short-range work on the GPU with everything else on the CPU. What's your
evidence for *sequential* rather than *imbalanced*?

Is it supposed to be so?

No, but without seeing your .log files, mdrun command lines and knowing
about your hardware, there's nothing we can say.

Is it the same reason as GPUs on PME-dedicated nodes won't be used during
a run like you said before?

Why would you suppose that? I said GPUs do work from the PP ranks on their
node. That's true here.

So if we want to exploit our hardware, we should map PP-PME ranks manually,

right? Say, use one node as PME-dedicated node and leave the GPUs on that
node idle, and use two nodes to do the other stuff. How do you think about
this arrangement?

Probably a terrible idea. You should identify the cause of the imbalance,
and fix that.

Mark

Theo

On 8/22/2014 7:20 PM, Mark Abraham wrote:

Hi,

Because no work will be sent to them. The GPU implementation can
accelerate
domains from PP ranks on their node, but with an MPMD setup that uses
dedicated PME nodes, there will be no PP ranks on nodes that have been set
up with only PME ranks. The two offload models (PP work - GPU; PME work
-
CPU subset) do not work well together, as I said.

One can devise various schemes in 4.6/5.0 that could use those GPUs, but
they either require
* each node does both PME and PP work (thus limiting scaling because of
the
all-to-all for PME, and perhaps making poor use of locality on
multi-socket
nodes), or
* that all nodes have PP ranks, but only some have PME ranks, and the
nodes
map their GPUs to PP ranks in a way that is different depending on whether
PME ranks are present (which could work well, but relies on the DD
load-balancer recognizing and taking advantage of the faster progress of
the PP ranks that have better GPU support, and requires that you get very
dirty hands laying out PP and PME ranks onto hardware that will later
match
the requirements of the DD load balancer, and probably that you balance
PP-PME load manually)

I do not recommend the last approach, because of its complexity.

Clearly there are design decisions to improve. Work is underway.

Cheers,

Mark

On Fri, Aug 22, 2014 at 10:11 AM, Theodore Si sjyz...@gmail.com wrote:

Hi Mark,

Could you tell me why that when we are GPU-CPU nodes as PME-dedicated
nodes, the GPU on such nodes will be idle?

Theo

On 8/11/2014 9:36 PM, Mark Abraham wrote:

Hi,

What Carsten said, if running on nodes that have GPUs.

If running on a mixed setup (some nodes with GPU, some not), then
arranging
your MPI environment to place PME ranks on CPU-only nodes is probably
worthwhile. For example, all your PP ranks first, mapped to GPU nodes,
then
all your PME ranks, mapped to CPU-only nodes, and then use mdrun
-ddorder
pp_pme.

Mark

On Mon, Aug 11, 2014 at 2:45 AM, Theodore Si sjyz...@gmail.com wrote:

Hi Mark,

This is information of our cluster, could you give us some advice as
regards to our cluster so that we can make GMX run faster on our
system?

Each CPU node has 2 CPUs and each GPU node has 2 CPUs and 2 Nvidia K20M

Device Name Device Type Specifications Number
CPU NodeIntelH2216JFFKRNodesCPU: 2×Intel Xeon E5-2670(8
Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 332
Fat NodeIntelH2216WPFKRNodesCPU: 2×Intel Xeon E5-2670(8
Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 256G(16×16G) ECC Registered DDR3 1600MHz Samsung Memory20
GPU Node

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-23 Thread Theodore Si


Hi,

When we used 2 GPU nodes (each has 2 cpus and 2 gpus) to do a mdrun(with 
no PME-dedicated node), we noticed that when CPU are doing PME, GPU are 
idle, that is they are doing their work sequentially. Is it supposed to 
be so? Is it the same reason as GPUs on PME-dedicated nodes won't be 
used during a run like you said before? So if we want to exploit our 
hardware, we should map PP-PME ranks manually, right? Say, use one node 
as PME-dedicated node and leave the GPUs on that node idle, and use two 
nodes to do the other stuff. How do you think about this arrangement?


Theo

On 8/22/2014 7:20 PM, Mark Abraham wrote:

Hi,

Because no work will be sent to them. The GPU implementation can accelerate
domains from PP ranks on their node, but with an MPMD setup that uses
dedicated PME nodes, there will be no PP ranks on nodes that have been set
up with only PME ranks. The two offload models (PP work - GPU; PME work -
CPU subset) do not work well together, as I said.

One can devise various schemes in 4.6/5.0 that could use those GPUs, but
they either require
* each node does both PME and PP work (thus limiting scaling because of the
all-to-all for PME, and perhaps making poor use of locality on multi-socket
nodes), or
* that all nodes have PP ranks, but only some have PME ranks, and the nodes
map their GPUs to PP ranks in a way that is different depending on whether
PME ranks are present (which could work well, but relies on the DD
load-balancer recognizing and taking advantage of the faster progress of
the PP ranks that have better GPU support, and requires that you get very
dirty hands laying out PP and PME ranks onto hardware that will later match
the requirements of the DD load balancer, and probably that you balance
PP-PME load manually)

I do not recommend the last approach, because of its complexity.

Clearly there are design decisions to improve. Work is underway.

Cheers,

Mark


On Fri, Aug 22, 2014 at 10:11 AM, Theodore Si sjyz...@gmail.com wrote:


Hi Mark,

Could you tell me why that when we are GPU-CPU nodes as PME-dedicated
nodes, the GPU on such nodes will be idle?


Theo

On 8/11/2014 9:36 PM, Mark Abraham wrote:


Hi,

What Carsten said, if running on nodes that have GPUs.

If running on a mixed setup (some nodes with GPU, some not), then
arranging
your MPI environment to place PME ranks on CPU-only nodes is probably
worthwhile. For example, all your PP ranks first, mapped to GPU nodes,
then
all your PME ranks, mapped to CPU-only nodes, and then use mdrun -ddorder
pp_pme.

Mark


On Mon, Aug 11, 2014 at 2:45 AM, Theodore Si sjyz...@gmail.com wrote:

  Hi Mark,

This is information of our cluster, could you give us some advice as
regards to our cluster so that we can make GMX run faster on our system?

Each CPU node has 2 CPUs and each GPU node has 2 CPUs and 2 Nvidia K20M


Device Name Device Type Specifications  Number
CPU NodeIntelH2216JFFKRNodesCPU: 2×Intel Xeon E5-2670(8
Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 332
Fat NodeIntelH2216WPFKRNodesCPU: 2×Intel Xeon E5-2670(8
Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 256G(16×16G) ECC Registered DDR3 1600MHz Samsung Memory20
GPU NodeIntelR2208GZ4GC CPU: 2×Intel Xeon E5-2670(8
Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 50
MIC NodeIntelR2208GZ4GC CPU: 2×Intel Xeon E5-2670(8
Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 5
Computing Network SwitchMellanox Infiniband FDR Core Switch
648× FDR Core Switch MSX6536-10R, Mellanox Unified Fabric Manager   1
Mellanox SX1036 40Gb Switch 36× 40Gb Ethernet Switch SX1036, 36× QSFP
Interface 1
Management Network Switch   Extreme Summit X440-48t-10G 2-layer
Switch
48× 1Giga Switch Summit X440-48t-10G, authorized by ExtremeXOS   9
Extreme Summit X650-24X 3-layer Switch  24× 10Giga 3-layer Ethernet
Switch
Summit X650-24X, authorized by ExtremeXOS1
Parallel StorageDDN Parallel Storage System DDN SFA12K
Storage
System   1
GPU GPU Accelerator NVIDIA Tesla Kepler K20M70
MIC MIC Intel Xeon Phi 5110P Knights Corner 10
40Gb Ethernet Card  MCX314A-BCBTMellanox ConnextX-3 Chip 40Gb
Ethernet Card
2× 40Gb Ethernet ports, enough QSFP cables  16
SSD Intel SSD910Intel SSD910 Disk, 400GB, PCIE  80







On 8/10/2014 5:50 AM, Mark Abraham wrote:

  That's not what I said You can set...

-npme behaves the same whether or not GPUs are in use. Using separate
ranks
for PME caters to trying to minimize the cost of the all-to-all
communication of the 3DFFT. That's still relevant when using GPUs, but
if
separate PME ranks are used, any GPUs on nodes that only have PME ranks
are
left idle. The most effective approach depends critically on the
hardware
and simulation setup, and whether you pay money

[gmx-users] Why we are not using GPU to solve FFT?

2014-08-22 Thread Theodore Si


Hi,

I wonder why we are using cpu instead of gpu to solve FFT? Is it 
possible to use gpu fft library, say cuFFT to make the FFT used in PME 
faster?


BR,
Theo
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-22 Thread Theodore Si


Hi Mark,

Could you tell me why that when we are GPU-CPU nodes as PME-dedicated 
nodes, the GPU on such nodes will be idle?


Theo

On 8/11/2014 9:36 PM, Mark Abraham wrote:

Hi,

What Carsten said, if running on nodes that have GPUs.

If running on a mixed setup (some nodes with GPU, some not), then arranging
your MPI environment to place PME ranks on CPU-only nodes is probably
worthwhile. For example, all your PP ranks first, mapped to GPU nodes, then
all your PME ranks, mapped to CPU-only nodes, and then use mdrun -ddorder
pp_pme.

Mark


On Mon, Aug 11, 2014 at 2:45 AM, Theodore Si sjyz...@gmail.com wrote:


Hi Mark,

This is information of our cluster, could you give us some advice as
regards to our cluster so that we can make GMX run faster on our system?

Each CPU node has 2 CPUs and each GPU node has 2 CPUs and 2 Nvidia K20M


Device Name Device Type Specifications  Number
CPU NodeIntelH2216JFFKRNodesCPU: 2×Intel Xeon E5-2670(8 Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 332
Fat NodeIntelH2216WPFKRNodesCPU: 2×Intel Xeon E5-2670(8 Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 256G(16×16G) ECC Registered DDR3 1600MHz Samsung Memory20
GPU NodeIntelR2208GZ4GC CPU: 2×Intel Xeon E5-2670(8 Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 50
MIC NodeIntelR2208GZ4GC CPU: 2×Intel Xeon E5-2670(8 Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 5
Computing Network SwitchMellanox Infiniband FDR Core Switch
648× FDR Core Switch MSX6536-10R, Mellanox Unified Fabric Manager   1
Mellanox SX1036 40Gb Switch 36× 40Gb Ethernet Switch SX1036, 36× QSFP
Interface 1
Management Network Switch   Extreme Summit X440-48t-10G 2-layer Switch
48× 1Giga Switch Summit X440-48t-10G, authorized by ExtremeXOS   9
Extreme Summit X650-24X 3-layer Switch  24× 10Giga 3-layer Ethernet Switch
Summit X650-24X, authorized by ExtremeXOS1
Parallel StorageDDN Parallel Storage System DDN SFA12K Storage
System   1
GPU GPU Accelerator NVIDIA Tesla Kepler K20M70
MIC MIC Intel Xeon Phi 5110P Knights Corner 10
40Gb Ethernet Card  MCX314A-BCBTMellanox ConnextX-3 Chip 40Gb
Ethernet Card
2× 40Gb Ethernet ports, enough QSFP cables  16
SSD Intel SSD910Intel SSD910 Disk, 400GB, PCIE  80







On 8/10/2014 5:50 AM, Mark Abraham wrote:


That's not what I said You can set...

-npme behaves the same whether or not GPUs are in use. Using separate
ranks
for PME caters to trying to minimize the cost of the all-to-all
communication of the 3DFFT. That's still relevant when using GPUs, but if
separate PME ranks are used, any GPUs on nodes that only have PME ranks
are
left idle. The most effective approach depends critically on the hardware
and simulation setup, and whether you pay money for your hardware.

Mark


On Sat, Aug 9, 2014 at 2:56 AM, Theodore Si sjyz...@gmail.com wrote:

  Hi,

You mean no matter we use GPU acceleration or not, -npme is just a
reference?
Why we can't set that to a exact value?


On 8/9/2014 5:14 AM, Mark Abraham wrote:

  You can set the number of PME-only ranks with -npme. Whether it's useful

is
another matter :-) The CPU-based PME offload and the GPU-based PP
offload
do not combine very well.

Mark


On Fri, Aug 8, 2014 at 7:24 AM, Theodore Si sjyz...@gmail.com wrote:

   Hi,


Can we set the number manually with -npme when using GPU acceleration?


--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.


  --

Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send

Re: [gmx-users] CUDA5.5 and ICC 14

2014-08-21 Thread Theodore Si


Thanks!

On 8/20/2014 8:03 PM, Mark Abraham wrote:

Hi,

Per Nvidia's docs and that error, CUDA 5.5 is only supported with that icc
version. In practice, I understand that other icc versions work fine with
GROMACS, but you would need to go and comment out that #error message.
Alternatively, CUDA 6.5 supports icc 14.0 (only).

Mark


On Wed, Aug 20, 2014 at 8:26 AM, Theodore Si sjyz...@gmail.com wrote:


Hi,

I am using CUDA 5.5 and Intel ICC 14.0.1 to compile GROMACS and this
happened:

[  0%] Building NVCC (Device) object src/gromacs/gmxlib/gpu_utils/
CMakeFiles/gpu_utils.dir//./gpu_utils_generated_gpu_utils.cu.o
In file included from /usr/local/cuda-5.5/include/cuda_runtime.h(59),
  from /home/theo/gromacs-5.0/src/gromacs/gmxlib/gpu_utils/
gpu_utils.cu(0):
/usr/local/cuda-5.5/include/host_config.h(72): catastrophic error: #error
directive: -- unsupported ICC configuration! Only ICC 12.1 on Linux x86_64
is supported!
   #error -- unsupported ICC configuration! Only ICC 12.1 on Linux x86_64
is supported!
^

CMake Error at gpu_utils_generated_gpu_utils.cu.o.cmake:198 (message):
   Error generating
/home/theo/gromacs-5.0/build/src/gromacs/gmxlib/gpu_utils/
CMakeFiles/gpu_utils.dir//./gpu_utils_generated_gpu_utils.cu.o


make[2]: *** [src/gromacs/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir/./
gpu_utils_generated_gpu_utils.cu.o] Error 1
make[1]: *** [src/gromacs/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir/all]
Error 2
make: *** [all] Error 2

Does this mean that I cannot make them work together?

BR,
Theo
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] CUDA5.5 and ICC 14

2014-08-20 Thread Theodore Si


Hi,

I am using CUDA 5.5 and Intel ICC 14.0.1 to compile GROMACS and this 
happened:


[  0%] Building NVCC (Device) object 
src/gromacs/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir//./gpu_utils_generated_gpu_utils.cu.o

In file included from /usr/local/cuda-5.5/include/cuda_runtime.h(59),
 from 
/home/theo/gromacs-5.0/src/gromacs/gmxlib/gpu_utils/gpu_utils.cu(0):
/usr/local/cuda-5.5/include/host_config.h(72): catastrophic error: 
#error directive: -- unsupported ICC configuration! Only ICC 12.1 on 
Linux x86_64 is supported!
  #error -- unsupported ICC configuration! Only ICC 12.1 on Linux 
x86_64 is supported!

   ^

CMake Error at gpu_utils_generated_gpu_utils.cu.o.cmake:198 (message):
  Error generating
/home/theo/gromacs-5.0/build/src/gromacs/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir//./gpu_utils_generated_gpu_utils.cu.o


make[2]: *** 
[src/gromacs/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir/./gpu_utils_generated_gpu_utils.cu.o] 
Error 1
make[1]: *** [src/gromacs/gmxlib/gpu_utils/CMakeFiles/gpu_utils.dir/all] 
Error 2

make: *** [all] Error 2

Does this mean that I cannot make them work together?

BR,
Theo
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-19 Thread Theodore Si


Hi,

How can we designate which CPU-only nodes to be PME-dedicated nodes? 
What mdrun options or what configuration should we use to make that happen?


BR,
Theo

On 8/11/2014 9:36 PM, Mark Abraham wrote:

Hi,

What Carsten said, if running on nodes that have GPUs.

If running on a mixed setup (some nodes with GPU, some not), then arranging
your MPI environment to place PME ranks on CPU-only nodes is probably
worthwhile. For example, all your PP ranks first, mapped to GPU nodes, then
all your PME ranks, mapped to CPU-only nodes, and then use mdrun -ddorder
pp_pme.

Mark


On Mon, Aug 11, 2014 at 2:45 AM, Theodore Si sjyz...@gmail.com wrote:


Hi Mark,

This is information of our cluster, could you give us some advice as
regards to our cluster so that we can make GMX run faster on our system?

Each CPU node has 2 CPUs and each GPU node has 2 CPUs and 2 Nvidia K20M


Device Name Device Type Specifications  Number
CPU NodeIntelH2216JFFKRNodesCPU: 2×Intel Xeon E5-2670(8 Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 332
Fat NodeIntelH2216WPFKRNodesCPU: 2×Intel Xeon E5-2670(8 Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 256G(16×16G) ECC Registered DDR3 1600MHz Samsung Memory20
GPU NodeIntelR2208GZ4GC CPU: 2×Intel Xeon E5-2670(8 Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 50
MIC NodeIntelR2208GZ4GC CPU: 2×Intel Xeon E5-2670(8 Cores,
2.6GHz, 20MB Cache, 8.0GT)
Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 5
Computing Network SwitchMellanox Infiniband FDR Core Switch
648× FDR Core Switch MSX6536-10R, Mellanox Unified Fabric Manager   1
Mellanox SX1036 40Gb Switch 36× 40Gb Ethernet Switch SX1036, 36× QSFP
Interface 1
Management Network Switch   Extreme Summit X440-48t-10G 2-layer Switch
48× 1Giga Switch Summit X440-48t-10G, authorized by ExtremeXOS   9
Extreme Summit X650-24X 3-layer Switch  24× 10Giga 3-layer Ethernet Switch
Summit X650-24X, authorized by ExtremeXOS1
Parallel StorageDDN Parallel Storage System DDN SFA12K Storage
System   1
GPU GPU Accelerator NVIDIA Tesla Kepler K20M70
MIC MIC Intel Xeon Phi 5110P Knights Corner 10
40Gb Ethernet Card  MCX314A-BCBTMellanox ConnextX-3 Chip 40Gb
Ethernet Card
2× 40Gb Ethernet ports, enough QSFP cables  16
SSD Intel SSD910Intel SSD910 Disk, 400GB, PCIE  80







On 8/10/2014 5:50 AM, Mark Abraham wrote:


That's not what I said You can set...

-npme behaves the same whether or not GPUs are in use. Using separate
ranks
for PME caters to trying to minimize the cost of the all-to-all
communication of the 3DFFT. That's still relevant when using GPUs, but if
separate PME ranks are used, any GPUs on nodes that only have PME ranks
are
left idle. The most effective approach depends critically on the hardware
and simulation setup, and whether you pay money for your hardware.

Mark


On Sat, Aug 9, 2014 at 2:56 AM, Theodore Si sjyz...@gmail.com wrote:

  Hi,

You mean no matter we use GPU acceleration or not, -npme is just a
reference?
Why we can't set that to a exact value?


On 8/9/2014 5:14 AM, Mark Abraham wrote:

  You can set the number of PME-only ranks with -npme. Whether it's useful

is
another matter :-) The CPU-based PME offload and the GPU-based PP
offload
do not combine very well.

Mark


On Fri, Aug 8, 2014 at 7:24 AM, Theodore Si sjyz...@gmail.com wrote:

   Hi,


Can we set the number manually with -npme when using GPU acceleration?


--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.


  --

Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-11 Thread Theodore Si


Hi Mark,

This is information of our cluster, could you give us some advice as 
regards to our cluster so that we can make GMX run faster on our system?


Each CPU node has 2 CPUs and each GPU node has 2 CPUs and 2 Nvidia K20M


Device Name Device Type Specifications  Number
CPU Node 	IntelH2216JFFKRNodes 	CPU: 2×Intel Xeon E5-2670(8 Cores, 
2.6GHz, 20MB Cache, 8.0GT)

Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 332
Fat Node 	IntelH2216WPFKRNodes 	CPU: 2×Intel Xeon E5-2670(8 Cores, 
2.6GHz, 20MB Cache, 8.0GT)

Mem: 256G(16×16G) ECC Registered DDR3 1600MHz Samsung Memory20
GPU Node 	IntelR2208GZ4GC 	CPU: 2×Intel Xeon E5-2670(8 Cores, 2.6GHz, 
20MB Cache, 8.0GT)

Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 50
MIC Node 	IntelR2208GZ4GC 	CPU: 2×Intel Xeon E5-2670(8 Cores, 2.6GHz, 
20MB Cache, 8.0GT)

Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 5
Computing Network Switch 	Mellanox Infiniband FDR Core Switch 	648× FDR 
Core Switch MSX6536-10R, Mellanox Unified Fabric Manager 	1
Mellanox SX1036 40Gb Switch 	36× 40Gb Ethernet Switch SX1036, 36× QSFP 
Interface 	1
Management Network Switch 	Extreme Summit X440-48t-10G 2-layer Switch 
48× 1Giga Switch Summit X440-48t-10G, authorized by ExtremeXOS 	9
Extreme Summit X650-24X 3-layer Switch 	24× 10Giga 3-layer Ethernet 
Switch Summit X650-24X, authorized by ExtremeXOS 	1

Parallel StorageDDN Parallel Storage System DDN SFA12K Storage 
System   1
GPU GPU Accelerator NVIDIA Tesla Kepler K20M70
MIC MIC Intel Xeon Phi 5110P Knights Corner 10
40Gb Ethernet Card 	MCX314A-BCBT 	Mellanox ConnextX-3 Chip 40Gb Ethernet 
Card

2× 40Gb Ethernet ports, enough QSFP cables  16
SSD Intel SSD910Intel SSD910 Disk, 400GB, PCIE  80






On 8/10/2014 5:50 AM, Mark Abraham wrote:

That's not what I said You can set...

-npme behaves the same whether or not GPUs are in use. Using separate ranks
for PME caters to trying to minimize the cost of the all-to-all
communication of the 3DFFT. That's still relevant when using GPUs, but if
separate PME ranks are used, any GPUs on nodes that only have PME ranks are
left idle. The most effective approach depends critically on the hardware
and simulation setup, and whether you pay money for your hardware.

Mark


On Sat, Aug 9, 2014 at 2:56 AM, Theodore Si sjyz...@gmail.com wrote:


Hi,

You mean no matter we use GPU acceleration or not, -npme is just a
reference?
Why we can't set that to a exact value?


On 8/9/2014 5:14 AM, Mark Abraham wrote:


You can set the number of PME-only ranks with -npme. Whether it's useful
is
another matter :-) The CPU-based PME offload and the GPU-based PP offload
do not combine very well.

Mark


On Fri, Aug 8, 2014 at 7:24 AM, Theodore Si sjyz...@gmail.com wrote:

  Hi,

Can we set the number manually with -npme when using GPU acceleration?


--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-09 Thread Theodore Si


Hi,

You mean no matter we use GPU acceleration or not, -npme is just a 
reference?

Why we can't set that to a exact value?

On 8/9/2014 5:14 AM, Mark Abraham wrote:

You can set the number of PME-only ranks with -npme. Whether it's useful is
another matter :-) The CPU-based PME offload and the GPU-based PP offload
do not combine very well.

Mark


On Fri, Aug 8, 2014 at 7:24 AM, Theodore Si sjyz...@gmail.com wrote:


Hi,

Can we set the number manually with -npme when using GPU acceleration?


--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Are there some environment variables not valid anymore?

2014-08-08 Thread Theodore Si


Hi,

I find this in the installation instruction part GMX 5.0：


 Helping CMake find the right libraries/headers/programs
 http://www.gromacs.org/Documentation/Installation_Instructions#TOC

If libraries are installed in non-default locations their location can 
be specified using the following environment variables:


 * |CMAKE_INCLUDE_PATH|for header files
 * |CMAKE_LIBRARY_PATH|for libraries
 * |CMAKE_PREFIX_PATH|for header, libraries and binaries
   (e.g.|/usr/local|).



However when I set the CMAKE_INCLUDE_PATH with -DCMAKE_INCLUDE_PATH I 
get this:


 209 //No help, variable specified on the command line.
 210 
CMAKE_INCLUDE_PATH:UNINITIALIZED=/home/theo/myprg/vt/include/vampirtrace/


Why?

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] How to profile GMX?

2014-08-07 Thread Theodore Si


Hi,

I would like to know what kind of profiling tools can be used with GMX?
Which is the most commonly used one?


--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Why is there a NxN VdW [F] on a separate line?

2014-08-05 Thread Theodore Si

This is extracted from a log file of a mdrun of 512 openMP threads 
without GPU acceleration. Since the first line and third line both have 
N*N Vdw [F], does the former include the latter?



As we can see, in the log file of a mdrun of 8 openMP threads without 
GPU acceleration, there is no standalone N*N Vdw [F], why the difference?



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Why is there a NxN VdW [F] on a separate line?

2014-08-05 Thread Theodore Si

Please compare the file 8.log
https://onedrive.live.com/redir?resid=990FCE59E48164A4%212481authkey=%21AI9ThbRY_7ZgAg8ithint=file%2clog
and 512.log
https://onedrive.live.com/redir?resid=990FCE59E48164A4%212482authkey=%21APLkizOBzXtPHxsithint=file%2clog.Their
M E G A - F L O P S A C C O U N T I N part are different as 8.log has
no standalone NxN VdW [F] and NxN VdW [VF]. 512.log has the following
lines.

NxN VdW [F] 17.077648 563.562 0.0
NxN VdW [VF]0.002592 0.111 0.0

Why the difference?

And the both have

NxN Ewald Elec. + VdW [F]
NxN Ewald Elec. + VdW [VF]

Does NxN Ewald Elec. + VdW [F] mean NxN Ewald Elec. and NxN VdW [F]?
If it is the case, why 512.log has both NxN Ewald Elec. + VdW [F] and
NxN VdW [F]?

On 8/5/2014 10:11 PM, Mark Abraham wrote:

On Tue, Aug 5, 2014 at 4:00 AM, Theodore Si sjyz...@gmail.com wrote:

This is extracted from a log file

There's no data. The list cannot accept attachments, so you need to
copy-paste a relevant chunk, or upload a log file to a file-sharing service.

of a mdrun of 512 openMP threads without GPU acceleration.

mdrun will refuse to run with 512 OpenMP threads - please report your mdrun
command line rather than your mental model of it.

Since the first line and third line both have N*N Vdw [F], does the former
include the latter?

No, but there is no line with N*N Vdw [F]. Please be precise if you are
asking for detailed information.

As we can see, in the log file of a mdrun of 8 openMP threads without GPU

acceleration, there is no standalone N*N Vdw [F], why the difference?

Can't tell, don't know what is different between the two runs. My guess is
that the former run is actually running on 64 MPI ranks, each of 8 OpenMP
threads, in which case you have domain decomposition per MPI rank, and in
that case there are separate calls to kernels that are aimed at computing
the interactions associated with atoms whose home is in different domains.
You should see the ratio vary as the number of ranks varies.

Mark

--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Error occurs when compiling gromacs using vtcc

2014-08-01 Thread Theodore Si

Hi all,

Does anyone know the instrumentation tool Vampir Trace? I am using it,
and I want to instrument the gromacs code.
So my cmake options are:

cmake .. -DCMAKE_BUILD_OWN_FFTW=ON -DCMAKE_C_COMPILER=vtcc
-DCMAKE_CXX_COMPILER=vtcxx -DGMX_MPI=on
-DCMAKE_INSTALL_PREFIX=/home/theo/gmx
-CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-5.5

Vampir Trace includes some binaries, like vtcc and vtcxx above. They
behave like compiler but are actually compiler wrapper.
For example, you can write main.c and vtcc main.c -o main, and the
binary main will be built and you can run it like a normal binary file,
but during the run, some files will be generated and can be analysed latter.

However, error occurs:

[ 2%] Building C object
src/gromacs/CMakeFiles/libgromacs.dir/__/external/tng_io/src/compression/tng_compress.c.o
/home/theo/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:25:
error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘speed’
/home/theo/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:
In function ‘quantize’:
/home/theo/gromacs-5.0/src/external/tng_io/src/compression/tng_compress.c:80:
warning: implicit declaration of function ‘verify_input_data’
At top level:
cc1: warning: unrecognized command line option -Wno-maybe-uninitialized
make[2]: ***
[src/gromacs/CMakeFiles/libgromacs.dir/__/external/tng_io/src/compression/tng_compress.c.o]
Error 1
make[1]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make: *** [all] Error 2

When I open the tng_compress.c, I find that line 25 is just a comment.

24 #define SPEED_DEFAULT 2 /* Default to relatively fast compression.
For very good compression it makes sense to
25 choose speed=4 or speed=5 */
26

Could you tell me what's going on?

I'd appreciate you kind help.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Some columns in log file.

2014-07-29 Thread Theodore Si

thnks a lot !

于2014年7月29日 19:53:19,Justin Lemkul写到:

On 7/28/14, 10:11 PM, Theodore Si wrote:

For example, a form that explains the meanings of the all items in
the log file.
I found this page
(http://www.gromacs.org/Documentation/Tutorials/GROMACS_USA_Workshop_and_Conference_2013/Topology_preparation,_%22What's_in_a_log_file%22,_basic_performance_improvements%3A_Mark_Abraham,_Session_1A)

yesterday, it give some information about the log file, but I think it's
inadequate.

Computing:
---
Domain decomp.

Doing DD.

DD comm. load
DD comm. bounds

DD communication stuff.

Send X to PME

Coordinate communication for PME.

Neighbor search

Self-explanatory.

Comm. coord.

Communicating coordinates.

Force

Doing forces.

Wait + Comm. F

Time spent waiting for, and communicating forces.

PME mesh

Self-explanatory.

PME wait for PP

Time spent on PME waiting for particle-particle interactions.

Wait + Recv. PME F

Time spent waiting and receiving PME forces.

NB X/F buffer ops.

Nonbonded forces and coordinate operations.

Write traj.

Self-explanatory.

Update

Application of all relevant updating functions.

Constraints

Self-explanatory.

Comm. energies

Communicating energy information.

Rest

Everything else.

---
Total
---
---
PME redist. X/F
PME spread/gather
PME 3D-FFT
PME 3D-FFT Comm.
PME solve
---

A bunch of stuff related to how PME works, communicating data, etc.

Could you explain each of these items' meaning? Especially, what does
X mean?

X means coordinates. F means forces. Most of these terms are
technical specs that determine how much time was spent doing what
during the run. Is there something in these terms that concerns you?
Something indicative of bad performance?

-Justin

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] How do I get to know the meaning of the first column in log file?

2014-07-28 Thread Theodore Si


   Core t (s)   Wall t (s)(%)
   Time: 2345.800   49.744 4715.7
 (ns/day)(hour/ns)
Performance:   69.4790.345


What does 0.345 hour/ns stand for? and the Wall time 49.77s?


于 2014/7/28 14:53, Mark Abraham 写道:

I plan to put some more of this kind of documentation in the upcoming User
Guide, but it isn't done yet.

Number of times that section was entered, the total wall time spent in that
section, and the total number of processor gigacycles spent in that
section, and percentage of same. Some columns are useful for only some
kinds of comparisons.

Mark


On Mon, Jul 28, 2014 at 5:35 AM, Theodore Si sjyz...@gmail.com wrote:


Thanks a lot!
But I am still confused about other things.
For instance, what do count, wall t(s) G-Cycles mean? It seems that the
last column is the percentage of G-Cycles.
I really hope there is a place where I can find all relative information
of the log file.

于 2014/7/28 11:12, Mark Abraham 写道:

  On Jul 28, 2014 4:53 AM, Theodore Si sjyz...@gmail.com wrote:

For example, in the following form, what does Wait + Comm. F mean? Is
there a webpage that explains the forms in log file?


Unfortunately not (yet), but they correspond in a more-or-less clear way
to
the segments in manual figure 3.16. In this case, to the three boxes below
Evaluate potential/forces. Significant time spent here would indicate
poor balance of compute load.

Mark

  R E A L C Y C L E A N D T I M E A C C O U N T I N G

Computing: Nodes Th. Count Wall t (s) G-Cycles %

  

-


Domain decomp. 24 2 801 2.827 351.962 5.7
DD comm. load 24 2 800 0.077 9.604 0.2
DD comm. bounds 24 2 800 0.354 44.014 0.7
Neighbor search 24 2 801 1.077 134.117 2.2
Launch GPU ops. 24 2 40002 1.518 189.021 3.1
Comm. coord. 24 2 19200 2.009 250.121 4.0
Force 24 2 20001 8.478 1055.405 17.0
Wait + Comm. F 24 2 20001 1.967 244.901 4.0
PME mesh 24 2 20001 24.064 2995.784 48.4
Wait GPU nonlocal 24 2 20001 0.170 21.212 0.3
Wait GPU local 24 2 20001 0.072 8.935 0.1
NB X/F buffer ops. 24 2 78402 0.627 78.050 1.3
Write traj. 24 2 2 0.037 4.569 0.1
Update 24 2 20001 0.497 61.874 1.0
Constraints 24 2 20001 4.198 522.645 8.4
Comm. energies 24 2 801 0.293 36.500 0.6
Rest 24 1.478 184.033 3.0

  

-


Total 24 49.744 6192.746 100.0

  

-

-


PME redist. X/F 24 2 40002 6.199 771.670 12.5
PME spread/gather 24 2 40002 7.194 895.557 14.5
PME 3D-FFT 24 2 40002 2.727 339.480 5.5
PME 3D-FFT Comm. 24 2 80004 7.460 928.742 15.0
PME solve 24 2 20001 0.434 53.968 0.9

  

-


--
Gromacs Users mailing list

* Please search the archive at


http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!


* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or


send a mail to gmx-users-requ...@gromacs.org.


--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] How do I get to know the meaning of the first column in log file?

2014-07-28 Thread Theodore Si

I thought that 69.479 ns/day means I can simulate 69.479 ns per day. But 
if as you said, I need 0.345 hour to get a simulated nanosecond, then I 
can only get 0.345 * 24 = 8.28 simulated nanosecond per day? Then what 
does 69.479 mean? And the Core t(s)?



于 2014/7/28 15:29, Mark Abraham 写道:

Your run took nearly a minute, and did so at a rate that would take 0.345
hours to do a simulated nanosecond

Mark


On Mon, Jul 28, 2014 at 9:05 AM, Theodore Si sjyz...@gmail.com wrote:


Core t (s)   Wall t (s)(%)
Time: 2345.800   49.744 4715.7
  (ns/day)(hour/ns)
Performance:   69.4790.345


What does 0.345 hour/ns stand for? and the Wall time 49.77s?


于 2014/7/28 14:53, Mark Abraham 写道:

  I plan to put some more of this kind of documentation in the upcoming User

Guide, but it isn't done yet.

Number of times that section was entered, the total wall time spent in
that
section, and the total number of processor gigacycles spent in that
section, and percentage of same. Some columns are useful for only some
kinds of comparisons.

Mark


On Mon, Jul 28, 2014 at 5:35 AM, Theodore Si sjyz...@gmail.com wrote:

  Thanks a lot!

But I am still confused about other things.
For instance, what do count, wall t(s) G-Cycles mean? It seems that the
last column is the percentage of G-Cycles.
I really hope there is a place where I can find all relative information
of the log file.

于 2014/7/28 11:12, Mark Abraham 写道:

   On Jul 28, 2014 4:53 AM, Theodore Si sjyz...@gmail.com wrote:


For example, in the following form, what does Wait + Comm. F mean? Is

there a webpage that explains the forms in log file?

  Unfortunately not (yet), but they correspond in a more-or-less clear

way
to
the segments in manual figure 3.16. In this case, to the three boxes
below
Evaluate potential/forces. Significant time spent here would indicate
poor balance of compute load.

Mark

   R E A L C Y C L E A N D T I M E A C C O U N T I N G


Computing: Nodes Th. Count Wall t (s) G-Cycles %

   


-

  Domain decomp. 24 2 801 2.827 351.962 5.7

DD comm. load 24 2 800 0.077 9.604 0.2
DD comm. bounds 24 2 800 0.354 44.014 0.7
Neighbor search 24 2 801 1.077 134.117 2.2
Launch GPU ops. 24 2 40002 1.518 189.021 3.1
Comm. coord. 24 2 19200 2.009 250.121 4.0
Force 24 2 20001 8.478 1055.405 17.0
Wait + Comm. F 24 2 20001 1.967 244.901 4.0
PME mesh 24 2 20001 24.064 2995.784 48.4
Wait GPU nonlocal 24 2 20001 0.170 21.212 0.3
Wait GPU local 24 2 20001 0.072 8.935 0.1
NB X/F buffer ops. 24 2 78402 0.627 78.050 1.3
Write traj. 24 2 2 0.037 4.569 0.1
Update 24 2 20001 0.497 61.874 1.0
Constraints 24 2 20001 4.198 522.645 8.4
Comm. energies 24 2 801 0.293 36.500 0.6
Rest 24 1.478 184.033 3.0

   


-

  Total 24 49.744 6192.746 100.0

   


-

-

  PME redist. X/F 24 2 40002 6.199 771.670 12.5

PME spread/gather 24 2 40002 7.194 895.557 14.5
PME 3D-FFT 24 2 40002 2.727 339.480 5.5
PME 3D-FFT Comm. 24 2 80004 7.460 928.742 15.0
PME solve 24 2 20001 0.434 53.968 0.9

   


-

  --

Gromacs Users mailing list

* Please search the archive at

  http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before

posting!

  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or

  send a mail to gmx-users-requ...@gromacs.org.

  --

Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] How do I get to know the meaning of the first column in log file?

2014-07-28 Thread Theodore Si


Hi all,

Could anyone explain all the meanings of these columns in log?
I'd appreciate your help.

Domain decomp.
DD comm. load
DD comm. bounds
Send X to PME
Neighbor search
Comm. coord.
Force
Wait + Comm. F
PME mesh
PME wait for PP
Wait + Recv. PME F
NB X/F buffer ops.
Write traj.
Update
Constraints
Comm. energies
Rest
--
Total
--
--
PME redist. X/F
PME spread/gather
PME 3D-FFT
PME 3D-FFT Comm.
PME solve




于2014年7月28日 15:51:15,Guillaume Chevrot写到:

Hi,

if you need 0.345 hour to get 1 ns, that means that you can simulate
1/0.345~2.9ns/hour, then in one day you will simulate 1/0.345*24~69.5 ns

Guillaume

On 07/28/2014 09:39 AM, Theodore Si wrote:

I thought that 69.479 ns/day means I can simulate 69.479 ns per day.
But if as you said, I need 0.345 hour to get a simulated nanosecond,
then I can only get 0.345 * 24 = 8.28 simulated nanosecond per day?
Then what does 69.479 mean? And the Core t(s)?


于 2014/7/28 15:29, Mark Abraham 写道:

Your run took nearly a minute, and did so at a rate that would take
0.345
hours to do a simulated nanosecond

Mark


On Mon, Jul 28, 2014 at 9:05 AM, Theodore Si sjyz...@gmail.com wrote:


Core t (s)   Wall t (s)(%)
Time: 2345.800   49.744 4715.7
  (ns/day)(hour/ns)
Performance:   69.4790.345


What does 0.345 hour/ns stand for? and the Wall time 49.77s?


于 2014/7/28 14:53, Mark Abraham 写道:

  I plan to put some more of this kind of documentation in the
upcoming User

Guide, but it isn't done yet.

Number of times that section was entered, the total wall time
spent in
that
section, and the total number of processor gigacycles spent in that
section, and percentage of same. Some columns are useful for only
some
kinds of comparisons.

Mark


On Mon, Jul 28, 2014 at 5:35 AM, Theodore Si sjyz...@gmail.com
wrote:

  Thanks a lot!

But I am still confused about other things.
For instance, what do count, wall t(s) G-Cycles mean? It seems
that the
last column is the percentage of G-Cycles.
I really hope there is a place where I can find all relative
information
of the log file.

于 2014/7/28 11:12, Mark Abraham 写道:

   On Jul 28, 2014 4:53 AM, Theodore Si sjyz...@gmail.com wrote:


For example, in the following form, what does Wait + Comm. F
mean? Is

there a webpage that explains the forms in log file?

  Unfortunately not (yet), but they correspond in a
more-or-less clear

way
to
the segments in manual figure 3.16. In this case, to the three
boxes
below
Evaluate potential/forces. Significant time spent here would
indicate
poor balance of compute load.

Mark

   R E A L C Y C L E A N D T I M E A C C O U N T I N G


Computing: Nodes Th. Count Wall t (s) G-Cycles %




-

  Domain decomp. 24 2 801 2.827 351.962 5.7

DD comm. load 24 2 800 0.077 9.604 0.2
DD comm. bounds 24 2 800 0.354 44.014 0.7
Neighbor search 24 2 801 1.077 134.117 2.2
Launch GPU ops. 24 2 40002 1.518 189.021 3.1
Comm. coord. 24 2 19200 2.009 250.121 4.0
Force 24 2 20001 8.478 1055.405 17.0
Wait + Comm. F 24 2 20001 1.967 244.901 4.0
PME mesh 24 2 20001 24.064 2995.784 48.4
Wait GPU nonlocal 24 2 20001 0.170 21.212 0.3
Wait GPU local 24 2 20001 0.072 8.935 0.1
NB X/F buffer ops. 24 2 78402 0.627 78.050 1.3
Write traj. 24 2 2 0.037 4.569 0.1
Update 24 2 20001 0.497 61.874 1.0
Constraints 24 2 20001 4.198 522.645 8.4
Comm. energies 24 2 801 0.293 36.500 0.6
Rest 24 1.478 184.033 3.0




-

  Total 24 49.744 6192.746 100.0




-

-

  PME redist. X/F 24 2 40002 6.199 771.670 12.5

PME spread/gather 24 2 40002 7.194 895.557 14.5
PME 3D-FFT 24 2 40002 2.727 339.480 5.5
PME 3D-FFT Comm. 24 2 80004 7.460 928.742 15.0
PME solve 24 2 20001 0.434 53.968 0.9




-

  --

Gromacs Users mailing list

* Please search the archive at

http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before

posting!

  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
or

  send a mail to gmx-users-requ...@gromacs.org.

  --

Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or

send a mail to gmx-users-requ...@gromacs.org.



--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post

[gmx-users] How do I get to know the meaning of the first column in log file?

2014-07-27 Thread Theodore Si

For example, in the following form, what does Wait + Comm. F mean? Is
there a webpage that explains the forms in log file?


R E A L C Y C L E A N D T I M E A C C O U N T I N G

Computing: Nodes Th. Count Wall t (s) G-Cycles %
-
Domain decomp. 24 2 801 2.827 351.962 5.7
DD comm. load 24 2 800 0.077 9.604 0.2
DD comm. bounds 24 2 800 0.354 44.014 0.7
Neighbor search 24 2 801 1.077 134.117 2.2
Launch GPU ops. 24 2 40002 1.518 189.021 3.1
Comm. coord. 24 2 19200 2.009 250.121 4.0
Force 24 2 20001 8.478 1055.405 17.0
Wait + Comm. F 24 2 20001 1.967 244.901 4.0
PME mesh 24 2 20001 24.064 2995.784 48.4
Wait GPU nonlocal 24 2 20001 0.170 21.212 0.3
Wait GPU local 24 2 20001 0.072 8.935 0.1
NB X/F buffer ops. 24 2 78402 0.627 78.050 1.3
Write traj. 24 2 2 0.037 4.569 0.1
Update 24 2 20001 0.497 61.874 1.0
Constraints 24 2 20001 4.198 522.645 8.4
Comm. energies 24 2 801 0.293 36.500 0.6
Rest 24 1.478 184.033 3.0
-
Total 24 49.744 6192.746 100.0
-
-
PME redist. X/F 24 2 40002 6.199 771.670 12.5
PME spread/gather 24 2 40002 7.194 895.557 14.5
PME 3D-FFT 24 2 40002 2.727 339.480 5.5
PME 3D-FFT Comm. 24 2 80004 7.460 928.742 15.0
PME solve 24 2 20001 0.434 53.968 0.9
-
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] How do I get to know the meaning of the first column in log file?

2014-07-27 Thread Theodore Si


Thanks a lot!
But I am still confused about other things.
For instance, what do count, wall t(s) G-Cycles mean? It seems that the 
last column is the percentage of G-Cycles.
I really hope there is a place where I can find all relative information 
of the log file.


于 2014/7/28 11:12, Mark Abraham 写道:

On Jul 28, 2014 4:53 AM, Theodore Si sjyz...@gmail.com wrote:

For example, in the following form, what does Wait + Comm. F mean? Is
there a webpage that explains the forms in log file?

Unfortunately not (yet), but they correspond in a more-or-less clear way to
the segments in manual figure 3.16. In this case, to the three boxes below
Evaluate potential/forces. Significant time spent here would indicate
poor balance of compute load.

Mark


R E A L C Y C L E A N D T I M E A C C O U N T I N G

Computing: Nodes Th. Count Wall t (s) G-Cycles %


-

Domain decomp. 24 2 801 2.827 351.962 5.7
DD comm. load 24 2 800 0.077 9.604 0.2
DD comm. bounds 24 2 800 0.354 44.014 0.7
Neighbor search 24 2 801 1.077 134.117 2.2
Launch GPU ops. 24 2 40002 1.518 189.021 3.1
Comm. coord. 24 2 19200 2.009 250.121 4.0
Force 24 2 20001 8.478 1055.405 17.0
Wait + Comm. F 24 2 20001 1.967 244.901 4.0
PME mesh 24 2 20001 24.064 2995.784 48.4
Wait GPU nonlocal 24 2 20001 0.170 21.212 0.3
Wait GPU local 24 2 20001 0.072 8.935 0.1
NB X/F buffer ops. 24 2 78402 0.627 78.050 1.3
Write traj. 24 2 2 0.037 4.569 0.1
Update 24 2 20001 0.497 61.874 1.0
Constraints 24 2 20001 4.198 522.645 8.4
Comm. energies 24 2 801 0.293 36.500 0.6
Rest 24 1.478 184.033 3.0


-

Total 24 49.744 6192.746 100.0


-
-

PME redist. X/F 24 2 40002 6.199 771.670 12.5
PME spread/gather 24 2 40002 7.194 895.557 14.5
PME 3D-FFT 24 2 40002 2.727 339.480 5.5
PME 3D-FFT Comm. 24 2 80004 7.460 928.742 15.0
PME solve 24 2 20001 0.434 53.968 0.9


-

--
Gromacs Users mailing list

* Please search the archive at

http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or

send a mail to gmx-users-requ...@gromacs.org.


--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Some columns in log file.

2014-07-27 Thread Theodore Si

Hi all,

In the log file, what do count, wall t(s) G-Cycles mean? It seems that
the last column is the percentage of G-Cycles.
I really hope there is a place where I can find all relative information
of the log file.
Thanks in advance.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] which item is gpu wait for cpu?

[gmx-users] How are the clusters of particles created?

[gmx-users] On what scale will simulation with PME-dedicated nodes perform better?

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

[gmx-users] Why we are not using GPU to solve FFT?

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

Re: [gmx-users] CUDA5.5 and ICC 14

[gmx-users] CUDA5.5 and ICC 14

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

[gmx-users] Are there some environment variables not valid anymore?

[gmx-users] How to profile GMX?

[gmx-users] Why is there a NxN VdW [F] on a separate line?

Re: [gmx-users] Why is there a NxN VdW [F] on a separate line?

[gmx-users] Error occurs when compiling gromacs using vtcc

Re: [gmx-users] Some columns in log file.

Re: [gmx-users] How do I get to know the meaning of the first column in log file?

Re: [gmx-users] How do I get to know the meaning of the first column in log file?

Re: [gmx-users] How do I get to know the meaning of the first column in log file?

[gmx-users] How do I get to know the meaning of the first column in log file?

Re: [gmx-users] How do I get to know the meaning of the first column in log file?

[gmx-users] Some columns in log file.

26 matches

Site Navigation

Mail list logo

Footer information