ble/xeon-scalable-spec-update.html
[5]
https://www.microway.com/knowledge-center-articles/detailed-specifications-of-the-skylake-sp-intel-xeon-processor-scalable-family-cpus/
--
Szilárd
On Thu, Apr 5, 2018 at 10:22 AM, Jochen Hub wrote:
>
>
> Am 03.04.18 um 19:03 schrieb Szilárd Páll:
Hi,
What is the reason for using the custom CMake options? What's the
-rdc=true for -- I don't think it's needed and it can very well be
causing the issue. Have you tried to actually do an
as-vanilla-as-possible build?
--
Szilárd
On Thu, Apr 5, 2018 at 6:52 PM, Borchert, Christopher B
ERDC-RDE-
Use google the "site:" keyword is ideal for that.
--
Szilárd
On Fri, Apr 6, 2018 at 3:51 PM, ZHANG Cheng <272699...@qq.com> wrote:
> Dear Gromacs,
> I know I can see all the post from
> https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/
>
>
> but can I search from this link? I do not w
On Wed, Apr 4, 2018 at 4:01 PM, Raman Preet Singh
wrote:
> Szilard:
>
> Thanks for sharing this valuable info. Very helpful.
>
> We are looking forward to procure a Xeon Phi processor based system. Specs
> are yet to be finalized.
FYI: the follow-up to the current Xeon Phi "KNL" has been cancele
Please don't shift the topic of an existing thread, rather create a
new one. Will reply to topics relevant to this discussion here, the
rest separately.
On Wed, Apr 4, 2018 at 4:01 PM, Raman Preet Singh
wrote:
> Szilard:
>
> Thanks for sharing this valuable info. Very helpful.
>
> We are looking
esn't crush and doesn't generate error message.
> It take forever without updating report in log file or other output files.
>
> Is this a bug?
>
>
>
> On Thu, Mar 29, 2018 at 7:58 AM, Szilárd Páll
> wrote:
>
>> Thanks. Looks like the messages and
On Tue, Apr 3, 2018 at 5:10 PM, Jochen Hub wrote:
>
>
> Am 03.04.18 um 16:26 schrieb Szilárd Páll:
>
>> On Tue, Apr 3, 2018 at 3:41 PM, Jochen Hub wrote:
>>>
>>>benchmar
>>>
>>> Am 29.03.18 um 20:57 schrieb Szilárd Páll:
>>>
>>
On Tue, Apr 3, 2018 at 3:41 PM, Jochen Hub wrote:
>
>
> Am 29.03.18 um 20:57 schrieb Szilárd Páll:
>
>> Hi Jochen,
>>
>> For that particular benchmark I only measured performance with
>> 1,2,4,8,16 cores with a few different kinds of GPUs. It would be easy
&g
Hi Jochen,
For that particular benchmark I only measured performance with
1,2,4,8,16 cores with a few different kinds of GPUs. It would be easy
to do the runs on all possible core counts with increments of 1, but
that won't tell a whole lot more than what the performance is of a run
using a E5-262
gt; Attachments can't be accepted on the list - please upload to a file sharing
>> service and share links to those.
>>
>> Mark
>>
>> On Wed, Mar 28, 2018 at 6:16 PM Myunggi Yi wrote:
>>
>> > I am attaching the file.
>> >
>> &g
Again, please share the exact log files / description of inputs. What
does "bad performance" mean?
--
Szilárd
On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi wrote:
> Dear users,
>
> I have two questions.
>
>
> 1. I used to run typical simulations with the following command.
>
> gmx mdrun -deffnm md
Hi,
I can't reproduce your issue, can you please share a full log file, please?
Cheers,
--
Szilárd
On Wed, Mar 28, 2018 at 5:26 AM, Myunggi Yi wrote:
> Dear users,
>
> I am running simulation with gromacs 2018.1 version
> on a computer with quad core and 1 gpu.
>
> I used to use the following
Hi,
This is an issue I noticed recently, but I thought it was only
affecting some use-cases (or some runtimes). However, it seems it's a
broader problem. It is under investigation, but for now it seems that
eliminate it (or strongly diminish its effects) by turning off
GPU-side task timing. You ca
As a side-note, your mdrun invocation does not seem suitable for GPU
accelerated runs, you'd most likely be better of running fewer ranks.
--
Szilárd
On Fri, Mar 23, 2018 at 9:26 PM, Christopher Neale
wrote:
> Hello,
>
> I am running gromacs 5.1.2 on single nodes where the run is set to use 32
rlist >= rcoulomb / rvdw is the correct one, the list cutoff has to be
at least as long as the longest of the interaction cut-offs.
--
Szilárd
On Mon, Mar 26, 2018 at 1:05 PM, huolei peng wrote:
> In the user guide of Gromacs 2016 (or 2018), it shows " rlist>=rcoulomb "
> or "rlist>=rvdw" in se
s.org_gmx-users-boun...@maillist.sys.kth.se
> [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] On Behalf Of
> Szilárd Páll
> Sent: Friday, 23 March 2018 11:45
> To: Discussion list for GROMACS users
> Cc: gromacs.org_gmx-users@maillist.sys.kth.se
> Subject: [EXTERNAL] Re: [gmx-users]
Hi,
Please provide the output of unit tests with 2018.1, it has improved
the error reporting targeting exactly these types of errors.
Cheers,
--
Szilárd
On Thu, Mar 22, 2018 at 5:44 PM, Tresadern, Gary [RNDBE]
wrote:
> Hi Mark,
> Thanks, I tried 2018-1 and was hopeful it would solve the proble
Your fftw download is not valid (corrupted, interrupted); as the cmake
output states, it's downloading: http://www.fftw.org/fftw-3.3.4.tar.gz
which has the former MD5 checksum, but yours produces the latter.
Try cleaning your build tree and re-running cmake (to re-download fftw).
--
Szilárd
On T
Note that rcoulomb, unlike rvdw, when using a PME long-range
electrostatics, is tunable (together with the PME grid spacing).
--
Szilárd
On Mon, Mar 12, 2018 at 3:43 PM, Justin Lemkul wrote:
>
>
> On 3/11/18 7:33 PM, Ahmed Mashaly wrote:
>>
>> Dear users,
>> Can I reduce the rvdw and rcoulomb du
visorycouncil.com/pdf/GROMACS_Analysis_Intel_E5_2697v3_K40_K80_GPUs.pdf
[2] http://mvapich.cse.ohio-state.edu/performance/pt_to_pt
--
Szilárd
On Mon, Mar 12, 2018 at 4:06 PM, Szilárd Páll wrote:
> Hi,
>
> Note that it matters a lot how far you want to parallelize and what
> kind of runs would you do? 10
Hi,
Note that it matters a lot how far you want to parallelize and what
kind of runs would you do? 10 GbE with RoCE may well be enough to
scale across a couple of such nodes, especially if you can squeeze PME
into a single node and avoid the MPI collectives across the network.
You may not even see
'd also recommend the cuda-memtest tool (instead of the AFAIK
outdated/unmaintained memtestG80).
Cheers,
--
Szilárd
>
>
>
> === Why be happy when you could be normal?
>
>
> --
> *From:* Szilárd Páll
> *To:* Discussion list for GROMACS use
BTW, we have considered adding a warmup delay to the tuner, would you be
willing to help testing (or even contributing such a feature)?
--
Szilárd
On Fri, Mar 2, 2018 at 7:28 PM, Szilárd Páll wrote:
> Hi Michael,
>
> Can you post full logs, please? This is likely related to a kn
Hi Michael,
Can you post full logs, please? This is likely related to a known issue
where CPU cores (and in some cases GPUs too) may take longer to clock up
and get a stable performance than the time the auto-tuner takes to do a few
cycles of measurements.
Unfortunately we do not have a good solu
Indeed if the two jobs do not know of each other, both will pin to the same
set of threads -- the default _shoud_ be 0,1,2,3,4,5 because it assumes
that you want to maximize performance with 6 threads only, and to do so it
pins one thread/core (i.e. uses stride 2).
When sharing a node among two ru
read some of the GROMACS papers (
http://www.gromacs.org/Gromacs_papers) or tldr see https://goo.gl/AGv6hy
(around slides 12-15).
Cheers,
--
Szilárd
>
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
> On Friday, March 2, 2018, 3:24:41 PM GMT+3:30, Szilárd Páll <
> pall.
Once again, full log files, please, not partial cut-and-paste, please.
Also, you misread something because your previous logs show:
-nb cpu -pme gpu: 56.4 ns/day
-nb cpu -pme gpu -pmefft cpu 64.6 ns/day
-nb cpu -pme cpu 67.5 ns/day
So both mixed mode PME and PME on CPU are faster, the latter slig
Have you read the "Types of GPU tasks" section of the user guide?
--
Szilárd
On Thu, Mar 1, 2018 at 3:34 PM, Mahmood Naderan
wrote:
> >Again, first and foremost, try running PME on the CPU, your 8-core Ryzen
> will be plenty fast for that.
>
>
> Since I am a computer guy and not a chemist, the
No, that does not seem to help much because the GPU is rather slow at
getting the PME Spread done (there's still 12.6% wait for the GPU to finish
that), and there are slight overheads that end up hurting performance.
Again, first and foremost, try running PME on the CPU, your 8-core Ryzen
will be
On Thu, Mar 1, 2018 at 8:25 AM, Mahmood Naderan
wrote:
> >(try the other parallel modes)
>
> Do you mean OpenMP and MPI?
>
No, I meant different offload modes.
>
> >- as noted above try offloading only the nonbondeds (or possibly the
> hybrid PME mode -pmefft cpu)
>
> May I know how? Which par
slightly dated and low-end similar board as
the GTX 950
--
Szilárd
On Wed, Feb 28, 2018 at 10:26 PM, Szilárd Páll
wrote:
> Thanks!
>
> Looking at the log file, as I guessed earlier, you can see the following:
>
> - Given that you have a rather low-end GPU and a fairly fast workst
Thanks!
Looking at the log file, as I guessed earlier, you can see the following:
- Given that you have a rather low-end GPU and a fairly fast workstation
CPU the run is *very* GPU-bound: the CPU spends 16.4 + 54.2 = 70.6% waiting
for the GPU (see lines 628 and 630)
- this means that the default
The list does not accept attachments, so please use a file sharing or
content sharing website so everyone can see your data and has the context.
--
Szilárd
On Wed, Feb 28, 2018 at 7:51 PM, Mahmood Naderan
wrote:
> >Additionally, you still have not provided the *mdrun log file* I
> requested. to
ype Process name
> Usage |
> |===
> ==|
> |0 1180 G /usr/lib/xorg/Xorg
> 141MiB |
> |0 1651 G compiz
> 46MiB |
> |0 3604 C gmx
> 90MiB |
> +-
Hi,
Please provide details, e.g. the full log so we know what version, on what
hardware, settings etc. you're running.
--
Szilárd
On Mon, Feb 26, 2018 at 8:02 PM, Mahmood Naderan
wrote:
> Hi,
>
> While the cut-off is set to Verlet and I run "gmx mdrun -nb gpu -deffnm
> input_md", I see that
Hi Michael,
What you observe is most likely due to v2018 by default shifting the
PME work to the GPU which will often mean fewer CPU cores are needed
and runs become more GPU-bound leaving the CPU without work for part
of the runtime. This should be easily seen by comparing the log files.
Especia
On Fri, Jan 12, 2018 at 2:35 AM, Jason Loo Siau Ee
wrote:
> Dear Carsten,
>
> Look's like we're seeing the same thing here, but only when using gcc 4.5.3:
>
> Original performance (gcc 5.3.1, AVX512, no hwloc support): 49 ns/day
>
> With hwloc support:
> gcc 4.5.3, AVX2_256 = 67 ns/day
> gcc 4.5.3
ance reasons is that you have at least 1
core per GPU (cores not a hardware threads).
Cheers,
--
Szilárd
>
> Thanks so much,
> Dan
>
>
>
> On Fri, Feb 9, 2018 at 10:27 AM, Szilárd Páll
> wrote:
>
> > On Fri, Feb 9, 2018 at 4:25 PM, Szilárd Páll
> > wrote:
; gencode;arch=compute_70,code=compute_70;-use_fast_math;-
> D_FORCE_INLINES;; ;-msse4.1;-std=c++11;-O3;-DNDEBUG;-funroll-all-
> loops;-fexcess-precision=fast;CUDA driver:9.0CUDA
> runtime: 9.0
> --
>
>
> Osmany
>
>
>
>
&g
PS: Also, what you pasted in here states "2016.4", but your subject
claims version 2018
--
Szilárd
On Thu, Feb 15, 2018 at 6:23 PM, Szilárd Páll wrote:
> Please provide a full log file output.
> --
> Szilárd
>
>
> On Thu, Feb 15, 2018 at 6:11 PM, Osmany Guirola Cru
Please provide a full log file output.
--
Szilárd
On Thu, Feb 15, 2018 at 6:11 PM, Osmany Guirola Cruz
wrote:
> Hi
>
> I am having problems running mdrun command compiled with GPU
> support(cuda 9.0).
> here is the output of the mdrun command
>
>
> Using 1 MPI thread
> Using 4 Open
Option A) Get a gcc 5.x (e.g. compile from source)
Option B) Install CUDA 9.1 (and required driver) which is compatible
with gcc 6.3
(http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
--
Szilárd
On Thu, Feb 15, 2018 at 2:34 PM, Michael Brunsteiner wrote:
>
> Hi,
> I have prob
Hi,
The fix will be released in an upcoming 2016.5 patch release. (which
you can see in the redmine issue page "Target version" field BTW).
Cheers,
--
Szilárd
On Mon, Feb 12, 2018 at 2:49 PM, Akshay wrote:
> Hello All,
>
> I was running REMD simulations on Gromacs 2016.1 when my simulation cra
Hi,
Thanks for the report!
Did you build with or without hwloc? There is a known issue with the
automatic pin stride when not using hwloc which will lead to a "compact"
pinning (using half of the cores with 2 threads/core) when <=half of the
threads are launched (instead of using all cores 1 thre
On Thu, Feb 8, 2018 at 10:20 PM, Mark Abraham
wrote:
> Hi,
>
> On Thu, Feb 8, 2018 at 8:50 PM Alex wrote:
>
> > Got it, thanks. Even with the old style input I now have a 42% speed up
> > with PME on GPU. How, how can I express my enormous gratitude?!
> >
>
> Do the science, cite the papers, spr
On Fri, Feb 9, 2018 at 4:25 PM, Szilárd Páll wrote:
> Hi,
>
> First of all,have you read the docs (admittedly somewhat brief):
> http://manual.gromacs.org/documentation/2018/user-guide/
> mdrun-performance.html#types-of-gpu-tasks
>
> The current PME GPU was optimized for s
Hi,
First of all,have you read the docs (admittedly somewhat brief):
http://manual.gromacs.org/documentation/2018/user-guide/mdrun-performance.html#types-of-gpu-tasks
The current PME GPU was optimized for single-GPU runs. Using multiple GPUs
with PME offloaded works, but this mode hasn't been an
On Thu, Feb 8, 2018 at 8:44 PM, Alex wrote:
> With -pme gpu, I am reporting 383.032 ns/day vs 270 ns/day with the 2016.4
> version. I _did not_ mistype. The system is close to a cubic box of water
> with some ions.
>
> Incredible.
>
> Alex
>
> On Thu, Feb 8, 2018 at 1
Note that the actual mdrun performance need not be affected both of it's
it's a driver persistence issue (you'll just see a few seconds lag at mdrun
startup) or some other CUDA application startup-related lag (an mdrun run
does mostly very different kind of things than this set of particular unit
t
our
rather simple unit tests that should take milliseconds rather than seconds
when PM is on (or X is running).
--
Szilárd
On Thu, Feb 8, 2018 at 6:50 PM, Szilárd Páll wrote:
> On Thu, Feb 8, 2018 at 6:46 PM, Alex wrote:
>
>> Are you suggesting that i should accept these results and
gt; ms)
> > > > > > > [--] 3 tests from AllocatorTest/0 (0 ms total)
> > > > > > >
> > > > > > > [--] 3 tests from AllocatorTest/1, where TypeParam =
> > > > > > > gmx::Allocator
> > > &g
It might help to know which of the unit test(s) in that group stall? Can
you run it manually (bin/gpu_utils-test) and report back the standard
output?
--
Szilárd
On Thu, Feb 8, 2018 at 3:56 PM, Alex wrote:
> Nope, still persists after reboot and no other jobs running:
> 9/39 Test #9: GpuUtil
BTW, timeouts can be caused by contention from stupid number of ranks/tMPI
threads hammering a single GPU (especially with 2 threads/core with HT),
but I'm not sure if the tests are ever executed with such a huge rank count.
--
Szilárd
On Thu, Feb 8, 2018 at 2:40 PM, Mark Abraham
wrote:
> Hi,
>
On Thu, Jan 25, 2018 at 7:23 PM, Searle Duay wrote:
> Hi Ake,
>
> I am not sure, and I don't know how to check the build. But, I see the
> following in the output log file whenever I run GROMACS in PSC bridges:
>
> GROMACS version:2016
> Precision: single
> Memory model: 64 bit
Hi,
You really should update to 2016.4, but if you for som reason can not, you
can pass the target CUDA architectures to CMake; see GMX_CUDA_TARGET_SM
documented here:
http://manual.gromacs.org/documentation/2016.4/install-guide/index.html#cuda-gpu-acceleration
--
Szilárd
On Tue, Jan 23, 2018 a
Hi,
This is an error emitted by hwloc library that GROMACS uses (and
OpenMPI too). As there is no indication of other errors, as the
message states you can disable it with the HWLOC_HIDE_ERRORS variable
which should allow the run to start.
You could also recompile GROMACS with a more recent OpenM
Hi Carsten,
The performance behavior you observe is expected, I have observed it
myself. Nothing seems unusual in the performance numbers you report.
The AVX512 clock throttle is additional (10-20% IIRC) to the AVX2 throttle,
and the only code that really gains significantly from AVX512 is the
no
If you had a previously balanced CPU-GPU setup, expect at max ~25%
improvement (with a decently fast >=Maxwell GPU) and at least a mid-sized
simulation system. GPUs are not great at doing small PME grid work and the
GROMACS PME code is really _fast_, so with small inputs and many/fast CPU
cores (<2
Hi Jochen,
Short answer: (most likely) it is due to the large difference in the
amount of bonded work (relative to the total step time). Does CHARMM36
use UB?
Cheers,
--
Szilárd
On Thu, Nov 30, 2017 at 5:33 PM, Jochen Hub wrote:
> Dear all,
>
> I have a question on the performance of the new P
On Mon, Nov 20, 2017 at 8:18 AM, Mark Abraham
wrote:
> Hi,
>
> For NVIDIA GPUs you should use their drivers and a CUDA build. It looks
> like you are using other drivers and an OpenCL build, which is completely
> untested.
>
Note: the OpenCL acceleration using NVIDIA's proprietary library,
distr
On Fri, Nov 10, 2017 at 9:53 AM, Javier Luque Di Salvo
wrote:
> Dear users and developers,
>
>
> I am about to acquire a Workstation to mainly run GROMACS. In choosing the
> right configuration that fits the budget, I came out with this basic arrange
> (from Dell - Precision 7820 Tower):
>
> - P
Hi,
It's not only the core count that matters, the speed of the cores also
does! For that reason, the high-clock Intel i9 and Threadripper CPUs
will be often faster than many dual socket machines with more cores.
BTW, servethehome.com has recently started doing benchmarks with
GROMACS and has ill
On Mon, Oct 16, 2017 at 2:56 PM, Harry Mark Greenblatt
wrote:
> BS”D
>
> Dear Bernard,
>
> Thank you for your reply and info. Our experience with a second graphics
> card (when coupled with the appropriate number of cores) is that we garner
> close to a 100% performance gain. We have access
Hi,
I think the 1070s and 1080s are the best value for money. These days, given
that it's I think it costs ~20% more, the 1080 is probably a bit better
value for money, but it's going to be a bit above $500, I think.
Also note that we're working on new GPU acceleration-related features that
will
--
Szilárd
On Thu, Sep 21, 2017 at 5:54 PM, Szilárd Páll wrote:
> Hi,
>
> A few remarks in no particular order:
>
> 1. Avoid domain-decomposition unless necessary (especially in
> CPU-bound runs, and especially with PME), it has a non-negligible
> overhead (greatest whe
Hi,
A few remarks in no particular order:
1. Avoid domain-decomposition unless necessary (especially in
CPU-bound runs, and especially with PME), it has a non-negligible
overhead (greatest when going from no DD to using DD). Running
multi-threading only typically has better performance. There are
http://manual.gromacs.org/documentation/5.1/user-guide/cmdline.html
--
Szilárd
On Tue, Sep 19, 2017 at 5:06 PM, K. Subashini wrote:
> Hi gromacs users,
>
>
> I am using version 5.1.4
>
>
> How to use gmx_mpi?
>
>
> I got the following error message
>
>
> Executable: /home/subashini/GPU/bin/gmx
2 fs).
That's just a rough estimate, though, and it assumes that you have
enough CPU cores for a balanced run.
--
Szilárd
On Tue, Sep 19, 2017 at 3:16 PM, Szilárd Páll wrote:
> On Tue, Sep 19, 2017 at 2:20 PM, Tomek Stępniewski
> wrote:
>> Hi everybody,
>> I am running gr
On Tue, Sep 19, 2017 at 2:20 PM, Tomek Stępniewski
wrote:
> Hi everybody,
> I am running gromacs 5.1.4 on a system that uses NVIDIA Tesla K40m,
> surprisingly I get a speed of only 15 ns a day when carrying out nvt
> simulations, my colleagues say that on a new GPU like this with my system
> size
at 9:11 PM, RAHUL SURESH wrote:
> Is there something that I can do without reducing the cut off..?
>
> On Mon, 18 Sep 2017 at 10:27 PM, Szilárd Páll
> wrote:
>
>> It means that there is a certain amount of time the CPU and GPU can
>> work concurrently to compute for
It means that there is a certain amount of time the CPU and GPU can
work concurrently to compute forces after which the CPU waits for the
results from the GPU to do the integration. If the CPU finishes a lot
sooner than the GPU, the run will be GPU performance-bound (and
vice-versa) -- which is wha
;>
>> Yes exactly. In this case I would need to manually set pinoffset but this
>> can be but frustrating if other Gromacs users are not binding :)
>> Would it be possible to fix this in the default algorithm, though am
>> unaware of other issues it might cause? Also
I would need to manually set pinoffset but this
> can be but frustrating if other Gromacs users are not binding :)
> Would it be possible to fix this in the default algorithm, though am
> unaware of other issues it might cause? Also mutidir is not convenient
> sometimes when job crashes i
> file would be difficult.
Let me answer that separately to emphasize a few technical issues.
Cheers,
--
Szilárd
> -J
>
>
> On Thu, Sep 14, 2017 at 11:26 AM, Szilárd Páll
> wrote:
>
>> On Wed, Sep 13, 2017 at 11:14 PM, gromacs query
>> wrote:
>> > Hi Szi
ulation). However, note that if you are sharing a node with
others, if their jobs are not correctly affinitized, those processes
will affect the performance of your job.
> I think still they will need -pinoffset. Could you
> please suggest what best can be done in such case?
See above.
Che
y option 3
>> myself. As there are other users from different places who may not bother
>> using option 3. I think I would need to ask the admin to force option 1 but
>> before that I will try option 3.
>>
>> JIom
>>
>> On Wed, Sep 13, 2017 at 7:10 PM, Szi
hese
> chosen pinoffsets will make sense or not as I don't know what pinoffset I
> would need to set
> - if I have to submit many jobs together and slurm chooses different/same
> node itself then I think it is difficult to define pinoffset.
>
> -
> J
>
> On Wed, Sep 13,
My guess is that the two jobs are using the same cores -- either all
cores/threads or only half of them, but the same set.
You should use -pinoffset; see:
- Docs and example:
http://manual.gromacs.org/documentation/2016/user-guide/mdrun-performance.html
- More explanation on the thread pinning b
Hi,
More threads can help, but unless I miscounted, the number of threads you
started in your previous run was larger than the number of cores (i.e. you
oversubscribed the CPU cores). That will never be beneficial with GROMACS.
The run stats look more reasonable now, but your run still starts wit
I'm still not sure why would the GPU-accelerated EM runs slower than
the CPU-only run -- unless the GPU in question is at best as fast at
computing nonbonded interactions as the CPU (cores) assigned to the
job.
Have you looked at log files?
--
Szilárd
On Sat, Sep 9, 2017 at 1:33 AM, Alex wrote
If you run MPI-enabled GROMACS and start N simulation with M(=1) ranks
each, you will have N*M processes. That's how MPI works. However, you
do not necessarily need to use MPI; the default build uses thread-MPI,
for instance.
--
Szilárd
On Fri, Sep 8, 2017 at 6:00 AM, MING HA wrote:
> Hi all,
>
Fixed it, thanks Åke!
--
Szilárd
On Thu, Sep 7, 2017 at 6:36 PM, Åke Sandgren wrote:
> Hi!
>
> The link for the release notes of gromacs 2016.3 on www.gromacs.org is
> pointing to 2016.1
>
> --
> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
> Internet: a...@hpc2n.umu.se Phone: +4
Hi,
Only one of the files has the right permissions, but based on that the
TL;DR conclusion is that you seem to be oversubscribing the nodes and
missing thread pinning.
You're also running at a fairly high parallelization (17700 atoms system on
72 cores across 3 nodes) and to get good performance
That kind of cmake invocation with MPI on without setting the
compilers to the MPI wrappers can often fail depending on how well the
MPI installation is detected by cmake. Try setting CMAKE_C_COMPILER
and CMAKE_CXX_COMPILER variables to the MPI compiler wrappers, i.e.
mpicc/mpic++ or similar.
--
Sz
vsites case it is 5 fs.
> Is that
> real cycle and time accounting for neighbor search, force, PME mesh
> etc,
> and time/step that you refer to is the wall time/call count?
Note sure if I understand the question, but I'll try to answer it:
that is simulation time/w
Indeed, "Wall t" is real application wall-time, nanoseconds/day is the
typical molecular dynamics performance unit that corresponds to the
effective amount of simulation throughput (note that this however
depends on the time-step and without that specified it is not useful
to compare to other runs)
Hi,
One of the systems is The Prace test case A
(http://www.prace-ri.eu/ueabs/), though the setup offered there is
only the vsites-enabled version.
Cheers,
--
Szilárd
On Wed, Jul 12, 2017 at 10:34 AM, wrote:
> Hi there,
> where can I find the benchmark input files for the 2 cases reported i
Additionally, if you care about getting the best performance on your
hardware be aware that distro-packaged GROMACS versions are generally
optimized for the lowest common denominator CPU architectural capabilities
and will not be able to make the best use of modern CPU instructions sets.
Hence, if
Leila,
If you want to use only one GPU, pass that GPU's ID to mdrun, e.g. -gpu_id
0 for the 1st one. You'll also want to pick the right number of cores for
the run, it will surely not make sense to use all 96. Also make sure to pin
the threads (-pin on).
However, I strongly recommend that you rea
t's what I was expecting to see there, so not sure what was wrong.
>
>
> S. Mostafa Razavi
> Graduate Research Assistant
> Molecular Simulation Lab
> Department of Chemical and Biomolecular Engineering
> The University of Akron, OH, USA
>
> On Fri, Jul 7, 2017 at 2:
> $ ./omp_helloc
> Hello World from thread = 0
> Number of threads = 4
> Hello World from thread = 3
> Hello World from thread = 1
> Hello World from thread = 2
>
>
>
>
> Thank you,
>
> S. Mostafa Razavi
>
>
> On Fri, Jul 7, 2017 at 9:12 AM, Szil
Raman,
First, please please have a look at the documentation and check out the
relevant GROMACS papers (http://www.gromacs.org/Gromacs_papers), in
particular http://dx.doi.org/10.1002/jcc.24030
On Tue, Jul 4, 2017 at 5:56 PM, Raman Preet Singh <
ramanpreetsi...@hotmail.com> wrote:
> Hello,
>
>
>
You've got a pretty strange beast there with 4 CPU sockets 24 cores each,
one very fast GPU and two rather slow ones (about 3x slower than the first).
If you want to do a single run on this machine, I suggest trying to
partition the rank across the GPUs so that you get a decent balance, e.g.
you c
Hi,
That looks weird. Can you compile a sample OpenMP code, e.g. this
https://computing.llnl.gov/tutorials/openMP/samples/C/omp_hello.c
--
Szilárd
On Fri, Jul 7, 2017 at 11:59 AM, Seyed Mostafa Razavi
wrote:
> Hello,
>
> I'm trying to install Gromacs (gromacs-2016.3) on my CentOS 6.5 system. I
Hi,
By quality are you referring to performance? The ideal CPU-GPU balance
depends on the type of runs (system, settings, FF) too. However, in most
cases, for current GROMACS versions you will need many and fast cores to
balance with the 1080 Ti -- which is about the fastest gaming GPU out there.
Petar,
Please show full log files, it would be easier to give suggestions. One
thing that is clearly suboptimal is that you're using #ranks = #cores,
you'd probably get better performance with 4-6 ranks per node.
Cheers,
--
Szilárd
On Sun, Jun 11, 2017 at 11:17 AM, Petar Zuvela
wrote:
> Dear
I suggest that you try to understand the parallelization option of mdrun
first. You seem to be mixing MPI and thread-MPI. The latter won't work for
multi-node runs. If you sort that out and launch with correctly set up PBS
parameters (ranks and cores/threads per ran), your runs should be fine.
htt
On Thu, Jun 1, 2017 at 9:39 PM, Elizabeth Ploetz wrote:
> However, if most runs are group scheme, a quick check could show whether
>
> jumps are present in runs that i) do PP-PME tuning ii) if logs go truncated
> during continuation at least whether they do use separate PME ranks
> (because other
On Wed, May 31, 2017 at 9:44 PM, Christopher Neale <
chris.ne...@alum.utoronto.ca> wrote:
> 1. Once you identify a continuation (with associated run script) that
> gives the discontinuity, if you run many repeats of the original
> continuation then does the jump always occur or only sometimes?
>
>
On Thu, May 25, 2017 at 2:09 PM, Marcelo Depólo wrote:
> Hi,
>
>
> I had the same struggle benchmarking a similar system last week. Just for
> curiosity, could you tell us the performance you get when sharing your GPU
> with multiple jobs?
BTW, interpreting some performance number is not not alwa
201 - 300 of 754 matches
Mail list logo