As Mark said, please share the *entire* log file. Among other
important things, the result of PP-PME tuning is not included above.
However, I suspect that in this case scaling is strongly affected or
by the small size of the system you are simulating.
--
Szilárd
On Sun, Nov 10, 2013 at 5:28 AM,
Let's not hijack James' thread as your hardware is different from his.
On Tue, Nov 5, 2013 at 11:00 PM, Dwey Kauffman mpi...@gmail.com wrote:
Hi Szilard,
Thanks for your suggestions. I am indeed aware of this page. In a 8-core
AMD with 1GPU, I am very happy about its performance. See
On Thu, Nov 7, 2013 at 6:34 AM, James Starlight jmsstarli...@gmail.com wrote:
I've gone to conclusion that simulation with 1 or 2 GPU simultaneously gave
me the same performance
mdrun -ntmpi 2 -ntomp 6 -gpu_id 01 -v -deffnm md_CaM_test,
mdrun -ntmpi 2 -ntomp 6 -gpu_id 0 -v -deffnm
Timo,
Have you used the default settings, that is one rank/GPU? If that is
the case, you may want to try using multiple ranks per GPU, this can
often help when you have 4-6 cores/GPU. Separate PME ranks are not
switched on by default with GPUs, have you tried using any?
Cheers,
--
Szilárd Páll
detected:
#0: NVIDIA GeForce GTX TITAN, compute cap.: 3.5, ECC: no, stat:
compatible
#1: NVIDIA GeForce GTX TITAN, compute cap.: 3.5, ECC: no, stat:
compatible
James
2013/11/4 Szilárd Páll pall.szil...@gmail.com
You can use the -march=native flag with gcc to optimize
On Tue, Nov 5, 2013 at 9:55 PM, Dwey Kauffman mpi...@gmail.com wrote:
Hi Timo,
Can you provide a benchmark with 1 Xeon E5-2680 with 1 Nvidia
k20x GPGPU on the same test of 29420 atoms ?
Are these two GPU cards (within the same node) connected by a SLI (Scalable
Link Interface) ?
That should be enough. You may want to use the -march (or equivalent)
compiler flag for CPU optimization.
Cheers,
--
Szilárd Páll
On Sun, Nov 3, 2013 at 10:01 AM, James Starlight jmsstarli...@gmail.com wrote:
Dear Gromacs Users!
I'd like to compile lattest 4.6 Gromacs with native GPU
Brad,
These numbers seems rather low for a standard simulation setup! Did
you use a particularly long cut-off or short time-step?
Cheers,
--
Szilárd Páll
On Fri, Nov 1, 2013 at 6:30 PM, Brad Van Oosten bv0...@brocku.ca wrote:
Im not sure on the prices of these systems any more
before buying. (Note
that I have never tried it myself, so I can't provide more details or
vouch for it in any way.)
Cheers,
--
Szilárd Páll
On Fri, Nov 1, 2013 at 3:08 AM, David Chalmers
david.chalm...@monash.edu wrote:
Hi All,
I am considering setting up a small cluster to run Gromacs jobs
You can use the -march=native flag with gcc to optimize for the CPU
your are building on or e.g. -march=corei7-avx-i for Intel Ivy Bridge
CPUs.
--
Szilárd Páll
On Mon, Nov 4, 2013 at 12:37 PM, James Starlight jmsstarli...@gmail.com wrote:
Szilárd, thanks for suggestion!
What kind of CPU
Hi Carsten,
On Thu, Oct 24, 2013 at 4:52 PM, Carsten Kutzner ckut...@gwdg.de wrote:
On Oct 24, 2013, at 4:25 PM, Mark Abraham mark.j.abra...@gmail.com wrote:
Hi,
No. mdrun reports the stride with which it moves over the logical cores
reported by the OS, setting the affinity of GROMACS
there are
a few analysis tools that support OpenMP and even with those I/O will
be a severe bottleneck if you were considering using the Phi-s for
analysis.
So for now, I would stick to using only the CPUs in the system.
Cheers,
--
Szilárd Páll
On Thu, Oct 10, 2013 at 12:58 PM, Arun Sharma arunsharma_
Hi,
Admittedly, both the documentation on these features and the
communication on the known issues with these aspects of GROMACS has
been lacking.
Here's a brief summary/explanation:
- GROMACS 4.5: implicit solvent simulations possible using mdrun-gpu
which is essentially mdrun + OpenMM, hence
On Mon, Sep 16, 2013 at 7:04 PM, PaulC paul.cah...@uk.fujitsu.com wrote:
Hi,
I'm attempting to build GROMACS 4.6.3 to run entirely within a single Xeon
Phi (i.e. native) with either/both Intel MPI/OpenMP for parallelisation
within the single Xeon Phi.
I followed these instructions from
, 2013 at 4:35 PM, Szilárd Páll szilard.p...@cbr.su.se wrote:
HI,
First of all, icc 11 is not well tested and there have been reports
about it compiling broken code. This could explain the crash, but
you'd need to do a bit more testing to confirm. Regading the GPU
detection error, if you use
Looks like you are compiling 4.5.1. You should try compiling the
latest version in the 4.5 series, 4.5.7.
--
Szilárd
On Sun, Sep 15, 2013 at 6:39 PM, Muthukumaran R kuma...@bicpu.edu.in wrote:
hello,
I am trying to install gromacs in cygwin but after issuing make,
installation stops with the
FYI, I've file a bug report which you can track if interested:
http://redmine.gromacs.org/issues/1334
--
Szilárd
On Sun, Sep 1, 2013 at 9:49 PM, Szilárd Páll szilard.p...@cbr.su.se wrote:
I may have just come across this issue as well. I have no time to
investigate, but my guess is that it's
HI,
First of all, icc 11 is not well tested and there have been reports
about it compiling broken code. This could explain the crash, but
you'd need to do a bit more testing to confirm. Regading the GPU
detection error, if you use a driver which is incompatible with the
CUDA runtime (at least as
On Tue, Sep 3, 2013 at 9:50 PM, Guanglei Cui
amber.mail.arch...@gmail.com wrote:
Hi Mark,
I agree with you and Justin, but let's just say there are things that are
out of my control ;-) I just tried SSE2 and NONE. Both failed the
regression check.
That's alarming, with
On Thu, Aug 29, 2013 at 7:18 AM, Gianluca Interlandi
gianl...@u.washington.edu wrote:
Justin,
I respect your opinion on this. However, in the paper indicated below by BR
Brooks they used a cutoff of 10 A on LJ when testing IPS in CHARMM:
Title: Pressure-based long-range correction for
I may have just come across this issue as well. I have no time to
investigate, but my guess is that it's related to some thread-safety
issue with thread-MPI.
Could one of you please file a bug report on redmine.gromacs.org?
Cheers,
--
Szilárd
On Thu, Aug 8, 2013 at 5:52 PM, Brad Van Oosten
That should never happen. If mdrun is compiled with GPU support and
GPUs are detected, the detection stats should always get printed.
Can you reliably reproduce the issue?
--
Szilárd
On Fri, Aug 2, 2013 at 9:50 AM, Jernej Zidar jernej.zi...@gmail.com wrote:
Hi there.
Lately I've been
Dear Ramon,
Thanks for the kind words!
On Tue, Jun 18, 2013 at 10:22 AM, Ramon Crehuet Simon
ramon.creh...@iqac.csic.es wrote:
Dear Szilard,
Thanks for your message. Your help is priceless and helps advance science
more than many publications. I extend that to many experts who kindly and
Hi,
The Intel compilers are only recommended for pre-Bulldozer AMD
processors (K10: Magny-Cours, Intanbul, Barcelona, etc.). On these,
PME non-bonded kernels (not the RF or plain cut-off!) are 10-30%
slower with gcc than with icc. The icc-gcc difference is the smallest
with gcc 4.7, typically
, 28. Mai 2013 um 19:50 Uhr
*Von:* Szilárd Páll szilard.p...@cbr.su.se
*An:* Discussion list for GROMACS users gmx-users@gromacs.org
*Betreff:* Re: Re: [gmx-users] GPU-based workstation
Dear all,
As far as I understand, the OP is interested in hardware for *running*
GROMACS 4.6 rather than
On Thu, Jul 25, 2013 at 5:55 PM, Mark Abraham mark.j.abra...@gmail.com wrote:
That combo is supposed to generate a CMake warning.
I also get a warning during linking that some shared library will have
to provide some function (getpwuid?) at run time, but the binary is
static.
That warning
On Fri, Jul 19, 2013 at 6:59 PM, gigo g...@ibb.waw.pl wrote:
Hi!
On 2013-07-17 21:08, Mark Abraham wrote:
You tried ppn3 (with and without --loadbalance)?
I was testing on 8-replicas simulation.
1) Without --loadbalance and -np 8.
Excerpts from the script:
#PBS -l nodes=8:ppn=3
The message is perfectly normal. When you do not use all available
cores/hardware threads (seen as CPUs by the OS), to avoid potential
clashes, mdrun does not pin threads (i.e. it lets the OS migrate
threads). On NUMA systems (most multi-CPU machines), this will cause
performance degradation as
Depending on the level of parallelization (number of nodes and number
of particles/core) you may want to try:
- 2 ranks/node: 8 cores + 1 GPU, no separate PME (default):
mpirun -np 2*Nnodes mdrun_mpi [-gpu_id 01 -npme 0]
- 4 ranks per node: 4 cores + 1 GPU (shared between two ranks), no
FYI: The MKL FFT has been shown to be up to 30%+ slower than FFTW 3.3.
--
Szilárd
On Thu, Jul 11, 2013 at 1:17 AM, Éric Germaneau german...@zoho.com wrote:
I have the same feeling too but I'm not in charge of it unfortunately.
Thank you, I appreciate.
On 07/11/2013 07:15 AM, Mark Abraham
Hi,
Is affinity setting (pinning) on? What compiler are you using? There
are some known issues with Intel OpenMP getting in the way of the
internal affinity setting. To verify whether this is causing a
problem, try turning of pinning (-pin off).
Cheers,
--
Szilárd
On Tue, Jul 9, 2013 at 5:29
Just a note regarding the performance issues mentioned. You are
using reaction-field electrostatics case in which by default there is
very little force workload left for the CPU (only the bondeds) and
therefore the CPU idles most of the time. To improve performance, use
-nb gpu_cpu with multiple
PS: the error message is referring the to *driver* version, not the
CUDA toolkit/runtime version.
--
Szilárd
On Tue, Jul 9, 2013 at 11:15 AM, Szilárd Páll szilard.p...@cbr.su.se wrote:
Tesla C1060 is not compatible - which should be shown in the log and
standard output.
Cheers,
--
Szilárd
Tesla C1060 is not compatible - which should be shown in the log and
standard output.
Cheers,
--
Szilárd
On Tue, Jul 9, 2013 at 10:54 AM, Albert mailmd2...@gmail.com wrote:
Dear:
I've installed a gromacs-4.6.3 in a GPU cluster, and I obtained the
following information for testing:
NOTE:
On Tue, Jul 9, 2013 at 11:20 AM, Albert mailmd2...@gmail.com wrote:
On 07/09/2013 11:15 AM, Szilárd Páll wrote:
Tesla C1060 is not compatible - which should be shown in the log and
standard output.
Cheers,
--
Szilárd
THX for kind comments.
do you mean C1060 is not compatible with cuda
FYI: 4.6.2 contains a bug related to thread affinity setting which
will lead to a considerable performance loss (I;ve seen 35%) as well
as often inconsistent performance - especially with GPUs (case in
which one would run many OpenMP threads/rank). My advice is that you
either use the code from
.
From: Szilárd Páll szilard.p...@cbr.su.se
To: Mare Libero marelibe...@yahoo.com; Discussion list for GROMACS users
gmx-users@gromacs.org
Sent: Thursday, June 27, 2013 10:47 AM
Subject: Re: [gmx-users] Installation on Ubuntu 12.04LTS
On Thu, Jun 27, 2013 at 12:57 PM
On Mon, Jun 24, 2013 at 4:43 PM, Szilárd Páll szilard.p...@cbr.su.se wrote:
On Sat, Jun 22, 2013 at 5:55 PM, Mirco Wahab
mirco.wa...@chemie.tu-freiberg.de wrote:
On 22.06.2013 17:31, Mare Libero wrote:
I am assembling a GPU workstation to run MD simulations, and I was
wondering if anyone has
On Thu, Jun 27, 2013 at 12:57 PM, Mare Libero marelibe...@yahoo.com wrote:
Hello everybody,
Does anyone have any recommendation regarding the installation of gromacs 4.6
on Ubuntu 12.04? I have the nvidia-cuda-toolkit that comes in synaptic
(4.0.17-3ubuntu0.1 installed in
Thanks Mirco, good info, your numbers look quite consistent. The only
complicating factor is that your CPUs are overclocked by different
amounts, which changes the relative performances somewhat compared to
non-overclocked parts.
However, let me list some prices to show that the top-of-the line
I strongly suggest that you consider the single-chip GTX cards instead
of a dual-chip one; from the point of view of price/performance you'll
probably get the most from a 680 or 780.
You could ask why, so here are the reasons:
- The current parallelization scheme requires domain-decomposition to
On Sat, Jun 22, 2013 at 5:55 PM, Mirco Wahab
mirco.wa...@chemie.tu-freiberg.de wrote:
On 22.06.2013 17:31, Mare Libero wrote:
I am assembling a GPU workstation to run MD simulations, and I was
wondering if anyone has any recommendation regarding the GPU/CPU
combination.
From what I can see,
If you have a solid example that reproduced the problem, feel free to
file an issue on redmine.gromacs.org ASAP. Briefly documenting your
experiments and verification process on the issue report page can help
help developers in giving you faster feedback as well as with
accepting the report as a
Dear Ramon,
Compute capability does not reflect the performance of a card, but it
is an indicator of what functionalities does the GPU provide - more
like a generation number or feature set version.
Quadro cards are typically quite close in performance/$ to Teslas with
roughly 5-8x *lower*
-funroll-all-loops -fexcess-precision=fast -O3 -DNDEBUG
All the regressiontests failed. So it appears that, at least for my system,
I need to include the directives to not use the external BLAS/LAPACK.
Amil
On Jun 10, 2013, at 12:12 PM, Szilárd Páll [via GROMACS] wrote:
Amil,
It looks like
Amil,
It looks like there is a mixup in your software configuration and
mdrun is linked against libguide.so, the OpenMP library part of the
Intel compiler v11 which gets loaded early and is probably causing the
crash. This library was probably pulled in implicitly by MKL which the
build system
On Sat, Jun 8, 2013 at 9:21 PM, Albert mailmd2...@gmail.com wrote:
Hello:
Recently I found a strange question about Gromacs-4.6.2 on GPU workstaion.
In my GTX690 machine, when I run md production I found that the ECC is on.
However, in my another GTX590 machine, I found the ECC was off:
4
On Wed, Jun 5, 2013 at 4:35 PM, João Henriques
joao.henriques.32...@gmail.com wrote:
Just to wrap up this thread, it does work when the mpirun is properly
configured. I knew it had to be my fault :)
Something like this works like a charm:
mpirun -npernode 2 mdrun_mpi -ntomp 8 -gpu_id 01
mdrun is not blind, just the current design does report the hardware
of all compute nodes used. Whatever CPU/GPU hardware mdrun reports in
the log/std output is *only* what rank 0, i.e. the first MPI process,
detects. If you have a heterogeneous hardware configuration, in most
cases you should be
-nt is mostly a backward compatibility option and sets the total
number of threads (per rank). Instead, you should set both -ntmpi
(or -np with MPI) and -ntomp. However, note that unless a single
mdrun uses *all* cores/hardware threads on a node, it won't pin the
threads to cores. Failing to pin
Just a few minor details:
- You can set the affinities yourself through the job scheduler which
should give nearly identical results compared to the mdrun internal
affinity if you simply assign cores to mdrun threads in a sequential
order (or with an #physical cores stride if you want to use
Thanks for reporting this.
he best would be a redmine bug with a tpr, command line invocation for
reproduction as well log output to see what software and hardware
configuration are you using.
Cheers,
--
Szilárd
On Mon, Jun 3, 2013 at 2:46 PM, Johannes Wagner
johannes.wag...@h-its.org wrote:
/ HRB 337446
Managing Directors:
Dr. h.c. Klaus Tschira
Prof. Dr.-Ing. Andreas Reuter
On 03.06.2013, at 16:01, Szilárd Páll szilard.p...@cbr.su.se wrote:
Thanks for reporting this.
he best would be a redmine bug with a tpr, command line invocation for
reproduction as well log output to see
There's no ibverbs support, s o pick your favorite/best MPI
implementation, more than that you can't do.
--
Szilárd
On Mon, Jun 3, 2013 at 2:54 PM, Bert bert.u...@gmail.com wrote:
Dear all,
My cluster has a FDR (56 Gb/s) Infiniband network. It is well known that
there is a big difference
10.04 comes with gcc 4.3 and 4.4 which should both work (we even test
them with Jenkins).
Still, you should really get a newer gcc, especially if you have an
8-core AMD CPU (= either Bulldozer or Piledriver) both of which are
fully supported only by gcc 4.7 and later. Additionally, AFAIK the
Dear all,
As far as I understand, the OP is interested in hardware for *running*
GROMACS 4.6 rather than developing code. or running LINPACK.
To get best performance it is important to use a machine with hardware
balanced for GROMACS' workloads. Too little GPU resources will result
in CPU
On Sat, May 25, 2013 at 2:16 PM, Broadbent, Richard
richard.broadben...@imperial.ac.uk wrote:
I've been running on my Universities GPU nodes these are one E5-xeon (6-cores
12 threads) and have 4 Nvidia 690gtx's. My system is 93 000 atoms of DMF
under NVE. The performance has been a little
On Tue, May 28, 2013 at 10:14 AM, James Starlight
jmsstarli...@gmail.com wrote:
I've found GTX Titat with 6gb of RAM and 384 bit. The price of such card is
equal to the price of the latest TESLA cards.
Nope!
Titan: $1000
Tesla K10: $2750
Tesla K20(c): $3000
TITAN is cheaper than any Tesla and
With the verlet cutoff scheme (new in 4.6) you get much better control
over the drift caused by (missed) short range interactions; you just
set a maximum allowed target drift and the buffer will be calculated
accordingly. Additionally, with the verlet scheme you are free to
tweak the neighbor
The thread-MPI library provides the thread affinity setting
functionality to mdrun, hence certain parts of it will always be
compiled in, even with GMX_MPI=ON. Apparently, the Cray compiler does
not like some of the thread-MPI headers. Feel free to file a bug
report on redmine.gromacs.org, but
The answer is in the log files, in particular the performance summary
should indicate where is the performance difference. If you post your
log files somewhere we can probably give further tips on optimizing
your run configurations.
Note that with such a small system the scaling with the group
On Fri, May 17, 2013 at 2:48 PM, Djurre de Jong-Bruinink
djurredej...@yahoo.com wrote:
The answer is in the log files, in particular the performance summary
should indicate where is the performance difference. If you post your
log files somewhere we can probably give further tips on optimizing
I'm not sure what you mean by threads. In GROMACS this can refer to
either thread-MPI or OpenMP multi-threading. To run within a single
compute node a default GROMACS installation using either of the two
aforementioned parallelization methods (or a combination of the two)
can be used.
--
Szilárd
PS: if your compute-nodes are Intel of some recent architecture
OpenMP-only parallelization can be considerably more efficient.
For more details see
http://www.gromacs.org/Documentation/Acceleration_and_parallelization
--
Szilárd
On Thu, May 16, 2013 at 7:26 PM, Szilárd Páll szilard.p
Hi,
Such an issue typically indicates a GPU kernel crash. This can be
caused by a large variety of factors from program bug to GPU hardware
problem. To do a simple check for the former please run with the CUDA
memory checker, e.g:
/usr/local/cuda/bin/cuda-memcheck mdrun [...]
Additionally, as
This error means that your binaries contain machine instructions that
the processor you run them on does not support. The most probable
cause is that you compiled the binaries on a machine with different
architecture than the one you are running on.
Cheers,
--
Szilárd
On Mon, Apr 29, 2013 at
Have you tried running on CPUs only just to see if the issue persists?
Unless the issue does not occur with the same binary on the same
hardware running on CPUs only, I doubt it's a problem in the code.
Do you have ECC on?
--
Szilárd
On Sun, Apr 28, 2013 at 5:27 PM, Albert mailmd2...@gmail.com
On Mon, Apr 29, 2013 at 2:41 PM, Albert mailmd2...@gmail.com wrote:
On 04/28/2013 05:45 PM, Justin Lemkul wrote:
Frequent failures suggest instability in the simulated system. Check your
.log file or stderr for informative Gromacs diagnostic information.
-Justin
my log file didn't have
while mdrun was running?
Cheers,
--
Szilárd
On Mon, Apr 29, 2013 at 3:32 PM, Albert mailmd2...@gmail.com wrote:
On 04/29/2013 03:31 PM, Szilárd Páll wrote:
The segv indicates that mdrun crashed and not that the machine was
restarted. The GPU detection output (both on stderr and log) should
On Mon, Apr 29, 2013 at 3:51 PM, Albert mailmd2...@gmail.com wrote:
On 04/29/2013 03:47 PM, Szilárd Páll wrote:
In that case, while it isn't very likely, the issue could be caused by
some implementation detail which aims to avoid performance loss caused
by an issue in the NVIDIA drivers
You got a warning at configure-time that the nvcc host compiler can't
be set because the mpi compiler wrapper are used. Because of this,
nvcc is using gcc to compile CPU code whick chokes on the icc flags.
You can:
- set CUDA_HOST_COMPILER to the mpicc backend, i.e. icc or
- let cmake detect MPI
Hi,
You should really check out the documentation on how to use mdrun 4.6:
http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Running_simulations
Brief summary: when running on GPUs every domain is assigned to a set
of CPU cores and a GPU, hence you need to start as many PP
On Tue, Apr 9, 2013 at 6:52 PM, David van der Spoel sp...@xray.bmc.uu.sewrote:
On 2013-04-09 18:06, Mikhail Stukan wrote:
Dear experts,
I have the following question. I am trying to compile GROMACS 4.6.1 with
GPU acceleration and have the following diagnostics:
# cmake .. -DGMX_DOUBLE=ON
On Mon, Apr 22, 2013 at 8:49 AM, Albert mailmd2...@gmail.com wrote:
On 04/22/2013 08:40 AM, Mikhail Stukan wrote:
Could you explain which hardware do you mean? As far as I know, K20X
supports double precision, so I would assume that double precision GROMACS
should be realizable on it.
Hi,
Your problem will likely be solved by not writing the rpath to the
binaries which can be accomplished by setting -DCMAKE_SKIP_RPATH=OFF.
This will mean that you will have to make sure that the library path
is set for mdrun to work.
If that does not fully solve the problem, you might have to
On Thu, Apr 18, 2013 at 6:17 PM, Mike Hanby mha...@uab.edu wrote:
Thanks for the reply, so the next question, after I finish building single
precision non parallel, is there an efficient way to kick off the double
precision build, then the single precision mpi and so on?
Or do I need to
On Sat, Apr 13, 2013 at 3:30 PM, Mirco Wahab
mirco.wa...@chemie.tu-freiberg.de wrote:
On 12.04.2013 20:20, Szilárd Páll wrote:
On Fri, Apr 12, 2013 at 3:45 PM, 라지브간디 ra...@kaist.ac.kr wrote:
Can cygwin recognize the CUDA installed in win 7? if so, how do i link
them ?
Good question, I've
On Sat, Apr 13, 2013 at 5:27 PM, Szilárd Páll szilard.p...@cbr.su.se wrote:
On Sat, Apr 13, 2013 at 3:30 PM, Mirco Wahab
mirco.wa...@chemie.tu-freiberg.de wrote:
On 12.04.2013 20:20, Szilárd Páll wrote:
On Fri, Apr 12, 2013 at 3:45 PM, 라지브간디 ra...@kaist.ac.kr wrote:
Can cygwin recognize
Indeed it's strange. In fact, it seems that CUDA detection did not
even run, there should be a message whether it found the toolkit or
not just before the Enabling native GPU acceleration - and the
enabling should not even happen without CUDA detected.
Unrelated, but do you really need MPI with
On Fri, Apr 12, 2013 at 3:45 PM, 라지브간디 ra...@kaist.ac.kr wrote:
Thanks for your answers. I have uninstalled the mpi, have also reinstalled
the CUDA and got the same issue. As you have mentioned before I noticed that
it struggle to detect the CUDA.
Do you mean that you reconfigured without
Hi,
No, it just means that *your simulation* does not scale. The question
is very vague, hence impossible to answer without more details
However, assuming that you are not running a, say, 5000 atom system
over 6 nodes, the most probable reason is that you have 6 Sandy Bridge
nodes with 12-16
On Wed, Apr 10, 2013 at 3:34 AM, Benjamin Bobay bgbo...@ncsu.edu wrote:
Szilárd -
First, many thanks for the reply.
Second, I am glad that I am not crazy.
Ok so based on your suggestions, I think I know what the problem is/was.
There was a sander process running on 1 of the CPUs. Clearly
Hi Andrew,
As others have said, 40x speedup with GPUs is certainly possible, but more
often than not comparisons leading to such numbers are not entirely fair -
at least from a computational perspective. The most common case is when
people compare legacy, poorly (SIMD)-optimized codes with some
On Wed, Apr 10, 2013 at 4:48 PM, 陈照云 chenzhaoyu...@gmail.com wrote:
I have tested gromacs-4.6.1 with k20.
But when I run the mdrun, I met some problems.
1.GPU only support float accelerating?
Yes.
2.Configure options are -DGMX_MPI ,-DGMX_DOUBLE .
But if I run parallely with mpirun, it
On Wed, Apr 10, 2013 at 4:50 PM, 申昊 shen...@mail.bnu.edu.cn wrote:
Hello,
I wanna ask some questions about load imbalance.
1 Here are the messages resulted from grompp -f md.mdp -p topol.top -c
npt.gro -o md.tpr
NOTE 1 [file md.mdp]:
The optimal PME mesh load for parallel
On Wed, Apr 10, 2013 at 4:24 PM, Szilárd Páll szilard.p...@cbr.su.se wrote:
Hi Andrew,
As others have said, 40x speedup with GPUs is certainly possible, but more
often than not comparisons leading to such numbers are not entirely fair -
at least from a computational perspective. The most
Hi Ben,
That performance is not reasonable at all - neither for CPU only run on
your quad-core Sandy Bridge, nor for the CPU+GPU run. For the latter you
should be getting more like 50 ns/day or so.
What's strange about your run is that the CPU-GPU load balancing is picking
a *very* long cut-off
On Mon, Apr 8, 2013 at 1:37 PM, Justin Lemkul jalem...@vt.edu wrote:
On Mon, Apr 8, 2013 at 2:28 AM, Hrachya Astsatryan hr...@sci.am wrote:
Dear all,
We have installed the latest version of Gromacs (version 4.6) on our
cluster by the following step:
* cmake .. -DGMX_MPI=ON
Hi,
As the error message states, the reason for the failed configuration is
that CMake can't auto-detect MPI which is needed when you are not providing
the MPI compiler wrapper as compiler.
If you want to build with MPI you can either let CMake auto-detect MPI and
just compile with the C
Hi,
You can certainly use your hardware setup. I assume you've been looking at
the log/console output based on which it might seem that mdrun is only
using the GPUs in the first (=master) node. However, that is not the case,
it's just that the current hardware and launch configuration reporting
Hi,
If mdrun says that it could not detect GPUs it simply means that the GPU
enumeration found no GPUs, otherwise it would have printed what was found.
This is rather strange because mdrun uses the same mechanism the
deviceQuery SDK example. I really don't have a good idea what could be the
On Thu, Mar 28, 2013 at 4:26 PM, Chandan Choudhury iitd...@gmail.com
wrote:
On Thu, Mar 28, 2013 at 4:09 PM, Szilárd Páll szilard.p...@cbr.su.se
wrote:
Hi,
If mdrun says that it could not detect GPUs it simply means that the GPU
enumeration found no GPUs, otherwise it would have printed
Hi,
Actually, if you don't want to run across the network, with those Westmere
processors you should be fine with running OpenMP across the two sockets,
i.e
mdrun -ntomp 24
or to run without HyperThreading (which can be sometimes faster) just use
mdrun -ntomp 12 -pin on
Now, when it comes to GPU
FYI: As much as Intel likes to say that you can just run MPI/MPI+OpenMP
code on MIC, you will probably not be impressed with the performance (it
will be *much* slower than a Xeon CPU).
If you want to know why and what/when are we doing something about it,
please read my earlier comments on MIC
Hi Quentin,
That's just a way of saying that something is wrong with either of the
following (in order of possibility of the event):
- your GPU driver is too old, hence incompatible with your CUDA version;
- your GPU driver installation is broken;
- your GPU is behaving in an unexpected/strange
FYI: On your machine running OpenMP across two sockets will probably not be
very efficient. Depending on the input and at how high paralleliation are
you running, you could be better off with running multiple MPI ranks per
GPU. This is a bit of an unexplained feature due to it being complicated to
Hi Chris,
You should be able to run on MIC/Xeon Phi as these accelerators, when used
in symmetric mode, behave just like a compute node. However, for two main
reasons the performance will be quite bad:
- no SIMD accelerated kernels for MIC;
- no accelerator-specific parallelization implemented
As Mark said, we need concrete details to answer the question:
- log files (all four of them: 1/2 nodes, 4.5/4.6)
- hardware (CPUs, network)
- compilers
The 4.6 log files contain much of the second and third point except the
network.
Note that you can compare the performance summary table's
that it's because it's almost
entirely water and hence probably benefits from the Group scheme
optimizations for water described on the Gromacs website.
Thanks again for the explanation,
Reid
On Mon, Mar 4, 2013 at 3:45 PM, Szilárd Páll szilard.p...@cbr.su.se
wrote:
Hi
On Thu, Mar 7, 2013 at 2:02 PM, Berk Hess g...@hotmail.com wrote:
Hi,
This was only a note, not a fix.
I was just trying to say that what linear algebra library you use for
Gromacs is irrelevant in more than 99% of the cases.
But having said that, the choice of library should not
1 - 100 of 285 matches
Mail list logo