Re: [gmx-users] parallelization of processors on single system/node using MPI

2013-11-28 Thread Szilárd Páll
Yes, you can besides the minor command-line interface differences, using MPI and thread-MPI works essentially the same way. http://www.gromacs.org/Documentation/Acceleration_and_parallelization#MPI.2c.c2.a0Thread-MPI -- Szilárd On Thu, Nov 28, 2013 at 11:07 AM, Richa Singh

Re: [gmx-users] AMBER ff10 with Gromacs

2013-11-28 Thread Szilárd Páll
to use AMBER99SB force field with the ParmBSC0 nucleic acid parameters, this combination can be found there. Cheers Tom On 11/27/2013 02:06 PM, Szilárd Páll wrote: Hi, If you look at share/gromacs/top/ in the GROMACS installation directory you can see which FF-s are included

[gmx-users] feedback wanted: dropping CUDA 3.2/4.0 support

2013-12-01 Thread Szilárd Páll
Hi, In order to ease the maintenance of the native GPU/acceleration in GROMACS, we are removing support for CUDA versions 3.2 and 4.0 in the next version (5.0). We have two options: - Limit the change to performance impact on the GPU kernels due to the removal of the legacy kernels, but do not

Re: [gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.

2013-12-04 Thread Szilárd Páll
Hi Henk, Thanks for the useful comments! When you run on a single GPU, you do get full timing details both on CPU and GPU - just have a look at the performance tables at the end of the log file. Alternatively you can simply run nvrpof mdrun which will by default give you a nice overview of

Re: [gmx-users] GROMACS 4.6.5 is released

2013-12-04 Thread Szilárd Páll
On Wed, Dec 4, 2013 at 10:15 AM, João Henriques joao.henriques.32...@gmail.com wrote: Soon enough we will have daily releases :P I hope you're not suggesting that we should release less frequently! :) Mark, can you please elaborate just a tiny bit longer on how relevant were the GPU-load

Re: [gmx-users] GROMACS 4.6.5 is released

2013-12-04 Thread Szilárd Páll
, I'll request a new install and tell them to remove the previous version form the cluster. We don't want segmentation faults to happen under certain circumstances :) Thank you, /J On Wed, Dec 4, 2013 at 2:27 PM, Szilárd Páll pall.szil...@gmail.com wrote: On Wed, Dec 4, 2013 at 10:15 AM

Re: [gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.

2013-12-05 Thread Szilárd Páll
Just to re-iterate what Henk said, this tweak is rather safe and if it happens to cause problems, from what I've read/hear that will become obvious quite quickly. Hence, it is worth trying - especially if you have a CPU-GPU imbalance with your hardware and simulation system. I have personally

Re: [gmx-users] load imbalance in multiple GPU simulations

2013-12-08 Thread Szilárd Páll
Hi, That's unfortunate, but not unexpected. You are getting a 3x1x1 decomposition where the middle cell has most of the protein, hence most of the bonded forces to calculate, while the ones on the side have little (or none). Currently, the only thing you can do is to try using more domains,

Re: [gmx-users] load imbalance in multiple GPU simulations

2013-12-08 Thread Szilárd Páll
exactly know how to do it (and that's rather low-level stuff anyway). On Mon, Dec 9, 2013 at 1:02 AM, yunshi11 . yunsh...@gmail.com wrote: Hi Szilard, On Sun, Dec 8, 2013 at 2:48 PM, Szilárd Páll pall.szil...@gmail.com wrote: Hi, That's unfortunate, but not unexpected. You are getting a 3x1x1

Re: [gmx-users] Gromos DPPC bilayer: different results for 4.0.7 and 4.6.5

2013-12-24 Thread Szilárd Páll
Just wondering, has anybody done a comparison with the Verlet scheme? It could be useful to know whether it produces results consistent with the 4.6 group scheme implementation or exhibits different behavior. Cheers, -- Szilárd On Sun, Dec 22, 2013 at 1:08 AM, Lutz Maibaum

Re: [gmx-users] Gromos DPPC bilayer: different results for 4.0.7 and 4.6.5

2013-12-29 Thread Szilárd Páll
On Sat, Dec 28, 2013 at 1:43 AM, Lutz Maibaum lutz.maib...@gmail.com wrote: On Dec 24, 2013, at 2:29 AM, Szilárd Páll pall.szil...@gmail.com wrote: Just wondering, has anybody done a comparison with the Verlet scheme? It could be useful to know whether it produces results consistent

Re: [gmx-users] gromacs-4.6.5 installation

2014-01-09 Thread Szilárd Páll
The install location is defined by CMAKE_INSTALL_PREFIX, the default is /usr/local and you need root permission to write there. However, you don't need to install system-wide, just install where you have permission, e.g. your home.

Re: [gmx-users] tabulated potentials on GPU?

2014-01-21 Thread Szilárd Páll
On Tue, Jan 21, 2014 at 1:45 PM, Mark Abraham mark.j.abra...@gmail.com wrote: Tabulated, yes. With all algorithms, who knows? Tabulated non-bondeds on GPU, I highly doubt. GPUs are for computing, not looking up random-access tables. Actually, the CUDA Ewald non-bonded kernels are tabulated on

Re: [gmx-users] future of shell completions in GROMACS

2014-01-22 Thread Szilárd Páll
On Tue, Jan 21, 2014 at 10:23 AM, Djurre de Jong-Bruinink djurredej...@yahoo.com wrote: Do you use this feature? Do you use this feature with a shell other than bash? Yes, I use this feature and missing it would be a great loss to my workflow. However, I only use it in bash. Is the

Re: [gmx-users] future of shell completions in GROMACS

2014-01-22 Thread Szilárd Páll
On Wed, Jan 22, 2014 at 5:50 PM, jkrie...@mrc-lmb.cam.ac.uk wrote: I +1 shell completions too. It would be ok for me too if they were only in bash. I think my default shell on the cluster may be tcsh but I can change it to bash if necessary. I'd be happy to try helping to implement it but I

Re: [gmx-users] future of shell completions in GROMACS

2014-01-22 Thread Szilárd Páll
Btw, regarding symlink, I'd suggest that at least for mdrun it would be nice to provide completion. This will also work for mdrun-only builds. -- Szilárd On Wed, Jan 22, 2014 at 8:59 PM, Teemu Murtola teemu.murt...@gmail.com wrote: Hi, On Wed, Jan 22, 2014 at 7:40 PM,

Re: [gmx-users] graphics card fail during mdrun

2014-01-27 Thread Szilárd Páll
The behavior of most programs in case of a hardware failure is undefined. However, as an MD simulation is quite sensitive to data corruption, the simulation will most often crash immediately after such an event. However, I can also imagine that mdrun ends up hanging, e.g. waiting for data from the

Re: [gmx-users] At what point is the random seed generated?

2014-01-29 Thread Szilárd Páll
On Wed, Jan 29, 2014 at 4:46 AM, Trayder Thomas trayder.tho...@monash.edu wrote: Curses! Kinda seemed intuitive that way. I assume the philosophy being worked towards is that every unique run should have a unique .tpr file that it is reproducible from? Curiously, I ran and checked a first

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-29 Thread Szilárd Páll
Hi Anders, This mail belongs to the users' list. This type of error is typically a sign of the CUDA kernel failing due to a nasty bug in the code or hardware error. The dmesg message is suspicious and it may be a hint of hardware error (see

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-30 Thread Szilárd Páll
That sounds strange. Does the error happen at step? Assuming the it does occur within the first 10 steps, here are a few things to try: - Run cuda-memcheck mdrun -nsteps 10; - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU acceleration code-path, but will use CPU kernels

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-30 Thread Szilárd Páll
On Thu, Jan 30, 2014 at 2:10 PM, AOWI (Anders Ossowicki) a...@novozymes.com wrote: Does the error happen at step? Assuming the it does occur within the first 10 steps, here are a few things to try: It happens immediately. As in: $ time mdrun snip real0m3.312s user0m6.768s sys

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-30 Thread Szilárd Páll
On Thu, Jan 30, 2014 at 4:19 PM, AOWI (Anders Ossowicki) a...@novozymes.com wrote: Well, with a 24k system a single iteration can be done in 2-3 ms, so those 3.3 seconds are mostly initialization and some number of steps - could be one, ten, or even hundred. Sure, but it fails even with

Re: [gmx-users] Is version 5.0 generating portable binaries?

2014-01-31 Thread Szilárd Páll
Note that both Opteron 2354 K10/Barcelona (http://goo.gl/6hyfO) and the Xeon 53xx / Clovertown (http://goo.gl/6hyfO) are quite old and none of them supports SSE4.1. Hence, pick SSE2 and build distributable binaries (AFAIK Clovertown does not support the rdtscp instructions). Also note that

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-31 Thread Szilárd Páll
FYI: GROMACS is known to work on IVB-E + K20 hardware, so I'm still leaning toward thinking that this is either hardware or CUDA software error. On Fri, Jan 31, 2014 at 2:57 PM, AOWI (Anders Ossowicki) a...@novozymes.com wrote: That's just weird. The Cuda API error detected does not sound good -

Re: [gmx-users] multiple GPU on multiple nodes

2014-02-04 Thread Szilárd Páll
On Tue, Feb 4, 2014 at 2:31 AM, Mark Abraham mark.j.abra...@gmail.com wrote: On Tue, Feb 4, 2014 at 1:51 AM, cyberjhon cyberj...@hotmail.com wrote: Dear Szilárd Thanks for your answer. To submit the job I do; qsub -l nodes=2:ppn=16,walltime=12:00:00 Then, to run gromacs I can do: aprun

Re: [gmx-users] multiple GPU on multiple nodes

2014-02-04 Thread Szilárd Páll
John, I strongly suggest that you consult the Blue Waters or other XK7 manual or talk to the support team. Understanding this hardware in crucial in getting any reasonable performance. As I said before, the inconsistency in your commands is that you request nnodes x nppn = 2 x 16 MPI ranks which

Re: [gmx-users] z always small for domain decomposition grid?

2014-02-06 Thread Szilárd Páll
More dimensions = more complex communication pattern. See http://pubs.acs.org/doi/abs/10.1021/ct700301q Note that it's not *always* x-y decomposition first, mdrun will look at the dimensions of your system too. One case where manually switching to 3D decomposition can be beneficial is when there

Re: [gmx-users] Since nstlist has no effect on the accuracy

2014-02-06 Thread Szilárd Páll
Exactly. The penalty of large nstlist is large buffer (rlist), but up to 40-50 with CPUs and up to 100 with GPUs is quite realistic. Hint: in 4.6.x you can use GMX_NSTLIST to override the value set in the mdp. From 5.0 there will be a command line option. Cheers, -- Szilárd On Thu, Feb 6, 2014

Re: [gmx-users] Since nstlist has no effect on the accuracy

2014-02-06 Thread Szilárd Páll
PS: The buffer is automatically calculated based on the verlet-buffer-drift set in the mdp file, for more details see the appendix of http://dx.doi.org/10.1016/j.cpc.2013.06.003. -- Szilárd On Thu, Feb 6, 2014 at 11:04 AM, Szilárd Páll pall.szil...@gmail.com wrote: Exactly. The penalty of large

Re: [gmx-users] Different optimal pme grid ... coulomb cutoff values from identical input files

2014-02-06 Thread Szilárd Páll
Note that your above (CPU) runs had a far from optimal PP-PME balanc (pme mesh/force should be close to one). Performance instability can be caused by a busy network (how many nodes are you running on?) or even incorrect affinity settings. If you post a/some log files, we may be able to tell

Re: [gmx-users] gromacs 4.6.4 query

2014-02-10 Thread Szilárd Páll
On Mon, Feb 10, 2014 at 1:27 PM, Chaitali Chandratre chaitujo...@gmail.com wrote: Dear Sir, I have question w.r.t gromacs-4.6.4 installation with GPU support. I have installed non-GPU vesion(for 4.6.4) and it works fine. But while compiling with gpu (-DGMX_GPU=ON

Re: [gmx-users] gromacs 4.6.4 query

2014-02-11 Thread Szilárd Páll
/ Icc details : icc version 13.0.1 (gcc version 4.4.6 compatibility) OpenMM is not available in environment. On Mon, Feb 10, 2014 at 6:15 PM, Szilárd Páll pall.szil...@gmail.comwrote: On Mon, Feb 10, 2014 at 1:27 PM, Chaitali Chandratre chaitujo...@gmail.com

Re: [gmx-users] Justifying 4fs production runs after 1fs equilibrations?

2014-02-11 Thread Szilárd Páll
On Tue, Feb 11, 2014 at 12:11 PM, unitALX alec.zan...@gmail.com wrote: Helllo all! In my general situation, I have a batch of homology models that I would like to assess for stability by molecular dynamics. I am working with a postdoc in my lab who was extensive experience with NAMD, but

Re: [gmx-users] GPU job often tailed

2014-02-12 Thread Szilárd Páll
Your mail reads like an FYI, what is the question? In case if you were wondering what causes this, it could be simply a soft error, but it's hard to tell. What GPU are you running on? If it's in your own workstation, you could consider running a longer stress-test on it using e.g. CUDA memtest.

Re: [gmx-users] GPU job often tailed

2014-02-12 Thread Szilárd Páll
wrote: I am using GTX690. OK, I will do some test. thank you for helpful advices. Albert On 02/12/2014 01:54 PM, Szilárd Páll wrote: Your mail reads like an FYI, what is the question? In case if you were wondering what causes this, it could be simply a soft error, but it's hard to tell

Re: [gmx-users] hybrid CPU/GPU nodes

2014-02-21 Thread Szilárd Páll
Da: Szilárd Páll pall.szil...@gmail.com A: Discussion list for GROMACS users gmx-us...@gromacs.org; Gloria Saracino glos...@yahoo.it Inviato: Venerdì 21 Febbraio 2014 1:10 Oggetto: Re: [gmx-users] hybrid CPU/GPU nodes Hi, Unfortunately the answer is not as simple as use 6-8 cores

Re: [gmx-users] Single or Double precision?

2014-03-12 Thread Szilárd Páll
Check the version header, it contain a Precision field (e.g. mdrun -version). -- Szilárd On Wed, Mar 12, 2014 at 9:03 AM, Dario Corrada dario.corr...@gmail.com wrote: How I can check if a packaged version of GROMACS (such as those available from Fedora or Ubuntu repositories) has been

Re: [gmx-users] Thread affinity error when Running Gromacs 4.6.3 on Bluegene/P

2014-03-12 Thread Szilárd Páll
That's a normal side-effect of mdrun trying to set thread affinities [1] on a platform which does not support this. Also, this is not an error, mdrun should have continued to run. Note that you would be much better off with upgrading to the latest version, there have been many improvements,

Re: [gmx-users] FYI: How to install Gromacs 4.6.5 on Windows 7: keep cygwin packages 1.7.1-2 for OpenMPI: version 1.7.4-2 for OpenMPI is compatibel

2014-03-13 Thread Szilárd Páll
Hi Chris, I've got a few questions/comments. 7) -DCMAKE_INSTALL_PREFIX=/cygdrive/c/Gromacs465 is the installation directory -DGMX_MPI=OFF With the second installation 10) to 13) you will replace the non-mpi mdrun.exe with one that can run on multiple cores. If you omit this step, the

Re: [gmx-users] Using GPU to do simulations

2014-04-17 Thread Szilárd Páll
On Thu, Apr 17, 2014 at 8:45 PM, mirc...@sjtu.edu.cn wrote: Hi everyone, I am trying to do some pulling simulations by gromacs (i.e., pull a ligand from its binding site), as the system is large I want to accelerate the simulations by GPU. I want to confirm that, Does GPU simulation support

Re: [gmx-users] Langevin sd integrator: wrong average temperature

2014-05-06 Thread Szilárd Páll
also be helpful to have your latest input files that you obtained the above results with, preferably mdp,gro, top. Cheers, -- Szilárd On Mon, May 5, 2014 at 5:32 PM, Szilárd Páll pall.szil...@gmail.com wrote: On Mon, May 5, 2014 at 10:45 AM, Mark Abraham mark.j.abra...@gmail.com wrote

Re: [gmx-users] possible configuration for gromacs gpu node

2014-05-06 Thread Szilárd Páll
Hi, Based on the performance data you provided, I'm afraid a GTX 770 won't be fast enough combined with a E5-2643V2 - at least for your system. Notice in the log output that the Wait GPU local accounts for 46% of the runtime. This is because while the bonded + PME force compute time takes 2.77

Re: [gmx-users] possible configuration for gromacs gpu node

2014-05-07 Thread Szilárd Páll
On Wed, May 7, 2014 at 2:13 PM, Harry Mark Greenblatt harry.greenbl...@weizmann.ac.il wrote: BSD So a job should run the same or faster on 10 cores at 2.5GHz relative to 6 cores at 3.5GHz? Thanks for letting me know Well, the guesstimation is not bulletproof, so if you want to be

Re: [gmx-users] GPU for free energy calculation?

2014-05-10 Thread Szilárd Páll
Hi, The Verlet scheme supports free energy calculation in 5.0; it works with GPUs too, but not in double precision. However, I suggest you consider first whether you really *need* double precision. If you do, it may still be beneficial for you to use the Verlet scheme in 5.0 -- especially if you

Re: [gmx-users] GPU in Ubuntu

2014-05-10 Thread Szilárd Páll
On Sat, May 10, 2014 at 10:40 PM, Albert mailmd2...@gmail.com wrote: it is better to choose a stable Linux for computational work. I recommend the following: Redhat Enterprise, Scientific Linux,SuSE Linux Enterprise Ubuntu is on not stable enough for professional work according to my own

Re: [gmx-users] why two GTX780ti is slower?

2014-05-14 Thread Szilárd Páll
This just tells that two GPU-s were detected but only the first one was automatically selected to be used - presumably because you manually specified the number of ranks (-np or -ntmpi) to be one. However, your mail contains neither the command line you started mdrun with, nor (a link to) the log

Re: [gmx-users] Multi-node GPU runs crashing with a fork() warning

2014-05-21 Thread Szilárd Páll
Hi, Sounds like an MPI or MPI+CUDA issue. Does mdrun run if you use a single GPU? How about two? Btw, unless you have some rather exotic setup, you won't be able to get much improvement from using more than three, at most four GPUs per node - you need CPU cores to match them (and a large system

Re: [gmx-users] mdrun_mpi with cmake

2014-05-22 Thread Szilárd Páll
http://www.gromacs.org/Documentation/Installation_Instructions#4.2._Using_CMake_command-line_options With 4.5/6 we generate a target with CMake called install-mdrun, so make install-mdrun will work. With 5.0 things will change a bit and you'll have to turn on an mdrun-only build. -- Szilárd

Re: [gmx-users] cutoff-sch​eme in CPU GPU

2014-05-24 Thread Szilárd Páll
Yes. However, a larger nstlist can be often faster with GPUs (and at high parallelization). mdrun will by default try to increase nstlist, but only if the value set in the mdp is quite low, IIRC 20. Hence, if you set nstlis=20 in the mdp, you won't get the automatic switching to 25 or 40 -

Re: [gmx-users] cutoff-sch​eme in CPU GPU

2014-05-25 Thread Szilárd Páll
On Sat, May 24, 2014 at 9:39 PM, Que Pasa quepas...@gmail.com wrote: Yes. You can. The entire program was built and was functional on CPUs well before GPUs came into the picture. Due to the way calculations are off-loaded to GPUs not all features available on CPUs are available in the GPU

Re: [gmx-users] optimize cpu with gpu node for gromacs

2014-06-17 Thread Szilárd Páll
Hi, In this case you can't do much, the GPU is simply not fast enough to finish its job before the CPU does. What you can try is: - decrease nstlist - especially if it has been increased by mdrun (hint use GMX_NSTLIST) - try running in mixed GPU+CPU non-bonded mode, e.g. mdrun -ntmpi 2 -gpu_id 00

Re: [gmx-users] Hamiltonian Replica Exchange

2014-06-25 Thread Szilárd Páll
an entirely different system at small lambda factors. -- Szilárd On Wed, Jun 25, 2014 at 6:21 PM, Szilárd Páll pall.szil...@gmail.com wrote: Hi, Next time, you should perhaps use plz, ... (and other eye-catching formatting marks); a nice mix of font colors and typefaces could also help to highlight

Re: [gmx-users] Hamiltonian Replica Exchange

2014-06-27 Thread Szilárd Páll
On Fri, Jun 27, 2014 at 11:36 AM, Thomas Evangelidis teva...@gmail.com wrote: Thank you all for your comment! The helix dimer is not very stable and I know that from extensive accelerated MD simulations I've done with AMBER using various boost values. Now I want to use a different force field

Re: [gmx-users] Gromacs 5.0 download

2014-07-02 Thread Szilárd Páll
The link is correct, something must be broken on your side. Try using e.g. wget: $ wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0.tar.gz ls -l gromacs*.gz --2014-07-02 22:34:44-- ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0.tar.gz = `gromacs-5.0.tar.gz' Resolving

Re: [gmx-users] Gromacs 5.0 download

2014-07-02 Thread Szilárd Páll
On Wed, Jul 2, 2014 at 11:08 PM, Justin Lemkul jalem...@vt.edu wrote: On 7/2/14, 4:36 PM, Szilárd Páll wrote: The link is correct, something must be broken on your side. Try using e.g. wget: $ wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0.tar.gz ls -l gromacs*.gz --2014-07-02 22:34

Re: [gmx-users] gmx-5.0 features

2014-07-08 Thread Szilárd Páll
On Thu, Jul 3, 2014 at 4:05 PM, Michael Brunsteiner mbx0...@yahoo.com wrote: Hi, I am sorry in case I overlooked the answers in the release-notes, but I didn't find there answers to: 1) does gmx-5.0 support free energy calculations + GPU ? Yes. 2) does gmx-5.0 support double

Re: [gmx-users] hardware setup for gmx

2014-07-08 Thread Szilárd Páll
Hi, Please have a look at the gmx-users history, there has been recent discussions about this topic. Brief answer: * If you only/mostly run GROMACS using GPU, Intel CPUs with many fast cores combined with high-end Geforce GTX cards will give the best performance/$; e.g. currently i7 4930K + GTX

Re: [gmx-users] 3dc + GPU

2014-07-10 Thread Szilárd Páll
On Thu, Jul 10, 2014 at 11:10 AM, Ondrej Kroutil okrou...@gmail.com wrote: Dear Mark, Thank for suggestion concerning switching between CPU and GPU. Second option was the one I was looking for! Thank you. And this reminds me I have one remark about new 5.0 version: in previous versions, when

Re: [gmx-users] Running job on GPUs

2014-07-11 Thread Szilárd Páll
On Fri, Jul 11, 2014 at 12:18 PM, Nidhi Katyal nidhikatyal1...@gmail.com wrote: Hello all I am trying to run my job on 2 nodes by utilizing all available cores. On each node of the cluster, we have two GPUs and two sockets with 8 cores each. Every time I am submitting the job, we find that

Re: [gmx-users] GPU build error

2014-07-15 Thread Szilárd Páll
Hi, Never seen this error, it looks like the compiler chokes on some boost header. At the same time, the __int128-related error seems to indicate that setting the -gcc-version=XYZ flag to some older gcc (e.g. 4.5/4.6) and by that requesting compatibility mode with the specified gcc version could

Re: [gmx-users] illegal instruction while using mdrun command

2014-07-15 Thread Szilárd Páll
On Tue, Jul 15, 2014 at 5:41 PM, Mark Abraham mark.j.abra...@gmail.com wrote: Hi, I would not have thought that was possible, It can happen e.g. if you configure with SSE4.1 acceleration, but you also use e.g. the -mfma4 or -march=bdver1 compiler flag. but the recommended solution is to

Re: [gmx-users] illegal instruction while using mdrun command

2014-07-15 Thread Szilárd Páll
So, Andy, could you share the log file the above command produced? -- Szilárd On Tue, Jul 15, 2014 at 11:22 PM, Szilárd Páll pall.szil...@gmail.com wrote: On Tue, Jul 15, 2014 at 5:41 PM, Mark Abraham mark.j.abra...@gmail.com wrote: Hi, I would not have thought that was possible, It can

Re: [gmx-users] my log file to the mdrun error message that on Tue 15, July...

2014-07-16 Thread Szilárd Páll
Hi, I don't see anything obviously wrong with your setup; there are two peculiarities that I suggest looking into: - you seem to be running in a virtualized environment (at least the hostname indicates this); check if the flags /proc/cpuinfo contains rdtscp and if it does not try setting

Re: [gmx-users] cutoff-sch​eme in CPU GPU

2014-07-16 Thread Szilárd Páll
). Could you explain me what it is happened in Cut-off schemes = verlet? And is it possible to have the trajectory with the right bonds (not broken bond)? Kind Regards, On Sun, May 25, 2014 at 5:39 PM, Szilárd Páll pall.szil...@gmail.com wrote: On Sat, May 24, 2014 at 9:39 PM, Que Pasa

Re: [gmx-users] Regarding Performance Tuning for GROMACS

2014-07-17 Thread Szilárd Páll
Hi, Benchmarking and tuning is generally quite machine-specific, but you could have a look at this great work done by Carsten Kutzner on SuperMUC: http://www.mpibpc.mpg.de/11832367/kutzner13talk-Parco.pdf https://www.mpibpc.mpg.de/14613164/Kutzner_2014_ParCo-conf2013.pdf On Thu, Jul 17, 2014 at

Re: [gmx-users] 答复: Can't allocate memory problem

2014-07-18 Thread Szilárd Páll
On Fri, Jul 18, 2014 at 7:31 PM, Yunlong Liu yliu...@jhmi.edu wrote: Hi, Thank you for your reply. I am actually not doing anything unusual, just common MD simulation of a protein. My system contains ~25 atoms, more or less depend on how many water molecules I put in it. The way I

Re: [gmx-users] 答复: Can't allocate memory problem

2014-07-18 Thread Szilárd Páll
On Fri, Jul 18, 2014 at 8:25 PM, Yunlong Liu yliu...@jhmi.edu wrote: Hi Mark, I post up my log file for the run here. Thank you. Log file opened on Wed Jul 16 11:26:51 2014 Host: c442-403.stampede.tacc.utexas.edu pid: 31032 nodeid: 0 nnodes: 4 GROMACS:mdrun_mpi_gpu, VERSION 5.0-rc1

Re: [gmx-users] 答复: 答复: Can't allocate memory problem

2014-07-18 Thread Szilárd Páll
, -- Szilárd Yunlong 发件人: gromacs.org_gmx-users-boun...@maillist.sys.kth.se gromacs.org_gmx-users-boun...@maillist.sys.kth.se 代表 Szilárd Páll pall.szil...@gmail.com 发送时间: 2014年7月19日 2:41 收件人: Discussion list for GROMACS users 主题: Re: [gmx-users] 答复

Re: [gmx-users] condition on MPI and GPU tasks

2014-07-22 Thread Szilárd Páll
On Wed, Jul 23, 2014 at 12:32 AM, Sikandar Mashayak symasha...@gmail.com wrote: Hi, I am checking out GPU performance of Gromacs5.0 on a single node of a cluster. The node has two 8-core Sandy Bridge Xeon E5-2670 and two NVIDIA K20x GPUs. My question - is there a restriction on how many

Re: [gmx-users] continuation run segmentation fault

2014-07-24 Thread Szilárd Páll
Hi, There is a certain version of MPI that caused a lot of headache until we realized that it is buggy. I'm not entirely sure what version was it, but I suspect it was the 1.4.3 shipped as default on Ubuntu 12.04 server. I suggest that you try: - using a different MPI version; - using a single

Re: [gmx-users] time accounting in log file with GPU

2014-07-24 Thread Szilárd Páll
On Fri, Jul 25, 2014 at 12:48 AM, Sikandar Mashayak symasha...@gmail.com wrote: Thanks Mark. -noconfout option helps. For benchmarking purposes, additionally to -noconfout I suggest also using: * -resethway or -resetstep: to exclude initialization and load-balancing at the beginning of the run

Re: [gmx-users] Gromacs performance on virtual servers

2014-07-24 Thread Szilárd Páll
On Fri, Jul 25, 2014 at 1:51 AM, Szilárd Páll pall.szil...@gmail.com wrote: Hi In general, virtualization will always have an overhead, but if done well, the performance should be close to that of bare metal. However, for GROMACS the ideal scenario is exclusive host access (including

Re: [gmx-users] Gromacs performance on virtual servers

2014-07-24 Thread Szilárd Páll
Hi In general, virtualization will always have an overhead, but if done well, the performance should be close to that of bare metal. However, for GROMACS the ideal scenario is exclusive host access (including hypervisor) and thread affinities which will both depend on the hypervisor

Re: [gmx-users] Gromacs performance on virtual servers

2014-07-25 Thread Szilárd Páll
On Fri, Jul 25, 2014 at 12:33 PM, Mark Abraham mark.j.abra...@gmail.com wrote: On Fri, Jul 25, 2014 at 1:51 AM, Szilárd Páll pall.szil...@gmail.com wrote: Hi In general, virtualization will always have an overhead, but if done well, the performance should be close to that of bare metal

Re: [gmx-users] Some columns in log file.

2014-07-28 Thread Szilárd Páll
On Mon, Jul 28, 2014 at 6:48 AM, Theodore Si sjyz...@gmail.com wrote: Hi all, In the log file, what do count, wall t(s) G-Cycles mean? It seems that the last column is the percentage of G-Cycles. Wall t (s) = wall-clock time spent in the respective part of code G-cycles = total giga- CPU

Re: [gmx-users] hardware setup for gmx

2014-07-30 Thread Szilárd Páll
? From: Szilárd Páll pall.szil...@gmail.com To: Michael Brunsteiner mbx0...@yahoo.com Sent: Thursday, July 17, 2014 2:00 AM Subject: Re: [gmx-users] hardware setup for gmx Dear Michael, I'd appreciate if you kept the further discussion on the gmx-users list

Re: [gmx-users] hardware setup for gmx

2014-07-30 Thread Szilárd Páll
On Thu, Jul 31, 2014 at 12:35 AM, Szilárd Páll pall.szil...@gmail.com wrote: Dear Michael, On Wed, Jul 30, 2014 at 1:49 PM, Michael Brunsteiner mbx0...@yahoo.com wrote: Dear Szilard, sorry for bothering you again ... regarding performance tuning by adjusting the VdW and Coulomb cut-offs

Re: [gmx-users] Performance of beowulf cluster

2014-08-05 Thread Szilárd Páll
Hi, You need fast network to parallelize across multiple nodes. 1 Gb ethernet won't work well and even even 10/40 Gb ethernet needs to be of good quality; you'd likely need to buy separate adapters, the on-board ones won't perform well. I posted some links to the list related to this a fed days

Re: [gmx-users] Performance of beowulf cluster

2014-08-11 Thread Szilárd Páll
borderline useless, but with the RDMA protocol iWARP over 10 and 40 GB Ethernet, I've seen people report decent results. Regards, Abhishek On Tue, Aug 5, 2014 at 5:46 PM, Szilárd Páll pall.szil...@gmail.com wrote: Hi, You need fast network to parallelize across multiple nodes. 1 Gb ethernet

Re: [gmx-users] Suggestions for Gromacs Perfomance

2014-08-12 Thread Szilárd Páll
You do not show your exact hardware configuration and with different CPUs you will surely get different performance. You do not show your command line or launch configuration (#ranks, #threads, #separate PME ranks) either, but based on the -gpu_id argument you have there, I assume you are

Re: [gmx-users] AVX vs AVX2

2014-08-13 Thread Szilárd Páll
-decomposition for the next version. I have plans to overclock the CPUs by 10-20%. That should work quite well. -- Szilárd On Tuesday, 12 August 2014 9:11 PM, Szilárd Páll pall.szil...@gmail.com wrote: Yes, it definitely will. The difference will be quite pronounced in CPU-only runs

Re: [gmx-users] Questions on reducing large loading imbalance

2014-08-18 Thread Szilárd Páll
13% is not that large and as far as I can tell the dynamic load balancing has not even kicked in (the above message would show the min/average cell volume due to the domain rescaling). You can try manually turning on load balancing with -dlb yes. -- Szilárd On Mon, Aug 18, 2014 at 7:14 PM,

Re: [gmx-users] GPU recommendations

2014-08-19 Thread Szilárd Páll
Hi, No need to worry, GTX 680 is quite recent and will be supported by GROMACS for at least a couple of years longer. Just make sure this second hand GPU is i) stable (run cuda-memtest) ii) is sold cheaper than a GTX 770 (which is faster). Cheers, -- Szilárd On Mon, Aug 18, 2014 at 4:34 PM,

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-19 Thread Szilárd Páll
On Tue, Aug 19, 2014 at 4:19 PM, Theodore Si sjyz...@gmail.com wrote: Hi, How can we designate which CPU-only nodes to be PME-dedicated nodes? mpirun -np N mdrun_mpi -npme M Starts N ranks out of which M will be PME-only and (M-N) PP ranks. What mdrun options or what configuration should

Re: [gmx-users] [gmx-developers] About dynamics loading balance

2014-08-24 Thread Szilárd Páll
On Thu, Aug 21, 2014 at 8:25 PM, Yunlong Liu yliu...@jh.edu wrote: Hi Roland, I just compiled the latest gromacs-5.0 version released on Jun 29th. I will recompile it as you suggested by using those Flags. It seems like the high loading imbalance doesn't affect the performance as well, which

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-25 Thread Szilárd Páll
On Mon, Aug 25, 2014 at 8:08 AM, Mark Abraham mark.j.abra...@gmail.com wrote: On Mon, Aug 25, 2014 at 5:01 AM, Theodore Si sjyz...@gmail.com wrote: Hi, https://onedrive.live.com/redir?resid=990FCE59E48164A4! 2572authkey=!AP82sTNxS6MHgUkithint=file%2clog

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-25 Thread Szilárd Páll
On Mon, Aug 25, 2014 at 7:12 PM, Xingcheng Lin linxingcheng50...@gmail.com wrote: Theodore Si sjyzhxw@... writes: Hi, https://onedrive.live.com/redir? resid=990FCE59E48164A4!2572authkey=!AP82sTNxS6MHgUkithint=file%2clog https://onedrive.live.com/redir?

Re: [gmx-users] Can we set the number of pure PME nodes when using GPUCPU?

2014-08-26 Thread Szilárd Páll
certainly outperform icc. Cheers, -- Szilárd It only works with 12.1 unless a header of CUDA 5.5 to be modified. Theo On 8/25/2014 9:44 PM, Szilárd Páll wrote: On Mon, Aug 25, 2014 at 8:08 AM, Mark Abraham mark.j.abra...@gmail.com wrote: On Mon, Aug 25, 2014 at 5:01 AM, Theodore Si sjyz

Re: [gmx-users] SSE4 → AVX2

2014-08-27 Thread Szilárd Páll
Normally the highest supported SIMD instructions set, which is in your case AVX2, is detected during configuration phase. The fact that you are getting that message means that: - you are cross-compiling on an SSE4.1 node (e.g. cluster head node) for AVX2 machines; - your compiler does not support

Re: [gmx-users] SSE4 → AVX2

2014-08-27 Thread Szilárd Páll
Correction: I meant AVX_256, not AVX2. -- Szilárd On Wed, Aug 27, 2014 at 12:00 PM, Szilárd Páll pall.szil...@gmail.com wrote: Normally the highest supported SIMD instructions set, which is in your case AVX2, is detected during configuration phase. The fact that you are getting that message

Re: [gmx-users] SSE4 → AVX2

2014-08-27 Thread Szilárd Páll
on that machine, I suspect you have an outdated compiler. What compiler are you using? 2014-08-27 12:00 GMT+02:00 Szilárd Páll pall.szil...@gmail.com: Normally the highest supported SIMD instructions set, which is in your case AVX2, is detected during configuration phase. The fact that you

Re: [gmx-users] GPU and MPI

2014-09-02 Thread Szilárd Páll
You may want to try other settings between 4x6 and 24x1 too, e.g. 12x2 or 6x4 - especially if you have a dual-socket 6-core machine with HyperThreading. In my experience, using as many ranks as hardware threads with HT in GPU runs results in big slowdown compared to either not using HT (i.e. 12x1)

Re: [gmx-users] Why compute capability or = 2.0??

2014-09-03 Thread Szilárd Páll
...and because earlier cards were not up to the task of accelerating GROMACS - unless paired with low-end CPUs. -- Szilárd On Wed, Sep 3, 2014 at 1:34 PM, Mark Abraham mark.j.abra...@gmail.com wrote: Hi, Because 2.0 has things that are useful, and maintaining two versions of any code is

Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?

2014-09-05 Thread Szilárd Páll
Please post the command lines you used to invoke mdrun as well as the log files of the runs you are comparing. Cheers, -- Szilárd On Fri, Sep 5, 2014 at 12:10 PM, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs users, I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2),

Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?

2014-09-05 Thread Szilárd Páll
at 12:37 PM, David McGiven davidmcgiv...@gmail.com wrote: Command line in both cases is : 1st : grompp -f grompp.mdp -c conf.gro -n index.ndx 2nd :mdrun -nt 48 -v -c test.out Log file you mean the standard output/error ? Attached to the email ? Thanks 2014-09-05 12:30 GMT+02:00 Szilárd

Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What wentwrong ?

2014-09-05 Thread Szilárd Páll
as 12. Regards, Abhishek Acharya On Fri, Sep 5, 2014 at 4:43 PM, David McGiven davidmcgiv...@gmail.com wrote: Thanks Szilard, here it goes! : 4.6.5 : http://pastebin.com/nqBn3FKs 5.0 : http://pastebin.com/kR4ntHtK 2014-09-05 12:47 GMT+02:00 Szilárd Páll

Re: [gmx-users] GPU job failed

2014-09-08 Thread Szilárd Páll
Hi, It looks like you're starting two ranks and passing two GPU IDs so it should work. The only think I can think of is that you are either getting the two MPI ranks placed on different nodes or that for some reason mpirun -np 2 is only starting one rank (MPI installation broken?). Does the same

Re: [gmx-users] PME

2014-09-08 Thread Szilárd Páll
Hi, By default, there will be no separate PME ranks used with less than AFAIR 12 ranks (i.e. the default with small number of ranks is -npme 0). Without separate PME ranks (and without GPUs) there is no PP-PME load balance to tweak, so the PME load is not very relevant from performance

Re: [gmx-users] Cuda CC 2.0 restrictions

2014-09-09 Thread Szilárd Páll
Hi, Is this rather large box a system that can actually be simulated with a useful speed on a single Fermi GPU? Even with 5 fs time-step you won't get much more than 1-1.5 ns/day on a fast Fermi GPU like a GTX 580. Given that you are quite a bit above the limit, unless you are using a quite

Re: [gmx-users] Hyper-threading Gromacs 5.0.1

2014-09-11 Thread Szilárd Páll
That many threads will most likely not be very efficient. If you are running on a single node it could be the case that 1 rank with 24 OpenMP threads will still the the fastest configuration, but 48 will be too much. Depending on how imbalanced your system is using DD can still be faster, so I

  1   2   3   4   5   6   7   >