Re: [gmx-users] Regarding Gromacs 5.0.3 parallel computation

2014-12-10 Thread Téletchéa Stéphane

Le 10/12/2014 07:22, Bikash Ranjan Sahoo a écrit :

I tried to do a small simulation in Gromacs 4.5.5 using 30 cores for 200
ps. The computation time was 4.56 minutes . The command used was dplace -c
0-29 mdrun -v -s md.tpr -c md.gro -nt 30 .

Next I ran the same system using Gromacs 5.0.3. The command used was
dplace -c 0-29 mpirun -np 30 mdrun_mpi -v -s md.tpr -c md.gro. The
simulation was extremely slow and took 37 minutes to complete only 200 ps
MD.


Dear Bikash,

You are more or less benchmarking threads versus mpi, right?

Note also that dplace for such rounded number (30) is probably not
optimal either except if your number of processors is in the 
multiplicity of 30 (hexacores?).


Best,

Stéphane

--
Team Protein Design In Silico
UFIP, UMR 6286 CNRS,
UFR Sciences et Techniques,
2, rue de la Houssinière, Bât. 25,
44322 Nantes cedex 03, France
Tél : +33 251 125 636
Fax : +33 251 125 632
http://www.ufip.univ-nantes.fr/ - http://www.steletch.org

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Regarding Gromacs 5.0.3 parallel computation

2014-12-10 Thread Bikash Ranjan Sahoo
Dear Dr. Stéphane,
   Thank you for your quick reply. How can I solve this? Can you please
guide me in getting the right command to run mdrun as fast as gromacs 4.5.5
-nt command. I am pasting the architecture of my cluster below. Kindly help
me understanding how can I modify the mdrun_mpi command to access multi
cores in my cluster.


Interactive Use 276 CPU Avail
-

Free Interactive Use CPU List:
 100 101 102 103 104 105 106 107 108 109 110 111
 112 113 114 115 116 117 118 119 120 121 122 123
 124 125 126 127 128 129 130 131 132 133 134 135
 136 138 139 140 141 142 143 144 145 146 147 148
 149 150 151 152 153 154 155 156 157 158 159 160
 161 162 163 164 165 166 167 168 169 170 171 172
 173 174 175 176 177 178 179 180 181 182 183 184
 185 186 187 188 189 190 191 193 194 195 196 197
 198 199 203 209 217 218 220 221 222 223 224 225
 226 227 228 229 230 231 232 233 234 235 236 237
 238 239 240 241 242 243 244 245 246 247 248 249
 250 251 252 253 254 255 256 257 258 259 260 261
 262 263 264 265 266 267 268 269 270 271 272 273
 274 275

System Topology (CPU position)
--

  column1  column2  column3  column4
row6  120-131   48-59   192-203   264-275
row5  108-119   36-47   180-191   252-263
row4   96-107   24-35   168-179   240-251
row3   84-9512-23   156-167   228-239
row2   72-83 0-11   144-155   216-227
row1   60-71  - 132-143   204-215

Memory Information
--

Interactive Use Free/Total Memory:
  1469 GB /  1514 GB


Best regards
In anticipation of your reply
Bikash


On Wed, Dec 10, 2014 at 5:38 PM, Téletchéa Stéphane 
stephane.teletc...@univ-nantes.fr wrote:

 Le 10/12/2014 07:22, Bikash Ranjan Sahoo a écrit :

 I tried to do a small simulation in Gromacs 4.5.5 using 30 cores for 200
 ps. The computation time was 4.56 minutes . The command used was dplace
 -c
 0-29 mdrun -v -s md.tpr -c md.gro -nt 30 .

 Next I ran the same system using Gromacs 5.0.3. The command used was
 dplace -c 0-29 mpirun -np 30 mdrun_mpi -v -s md.tpr -c md.gro. The
 simulation was extremely slow and took 37 minutes to complete only 200 ps
 MD.


 Dear Bikash,

 You are more or less benchmarking threads versus mpi, right?

 Note also that dplace for such rounded number (30) is probably not
 optimal either except if your number of processors is in the multiplicity
 of 30 (hexacores?).

 Best,

 Stéphane

 --
 Team Protein Design In Silico
 UFIP, UMR 6286 CNRS,
 UFR Sciences et Techniques,
 2, rue de la Houssinière, Bât. 25,
 44322 Nantes cedex 03, France
 Tél : +33 251 125 636
 Fax : +33 251 125 632
 http://www.ufip.univ-nantes.fr/ - http://www.steletch.org

 --
 Gromacs Users mailing list

 * Please search the archive at http://www.gromacs.org/
 Support/Mailing_Lists/GMX-Users_List before posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Regarding Gromacs 5.0.3 parallel computation

2014-12-10 Thread Téletchéa Stéphane

Le 10/12/2014 12:28, Bikash Ranjan Sahoo a écrit :

Dear Dr. Stéphane,
 Thank you for your quick reply. How can I solve this? Can you please 
guide me in getting the right command to run mdrun as fast as gromacs 
4.5.5 -nt command. I am pasting the architecture of my cluster below. 
Kindly help me understanding how can I modify the mdrun_mpi command to 
access multi cores in my cluster.




Dear Bikash,

Why not using the -nt option in gromacs 5.0.3 too?
My point was that you should use the same parameters for comparison of 
performance...


What I would do is:

mdrun -nt 30 -pin auto

Try first using the gromacs tools (-pin auto) than using the dplace 
command since in your case
you ask it to split you job on different cpu according to the diagram 
you showed. At least you could

use the logical architecture for the dplace command:
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linuxdb=bkssrch=fname=/SGI_Developer/LX_AppTune/sgi_html/ch05.html

Last, you should also try to first tune the pme decomposition using the 
tunepme option on a short run
to see if the rather rough domain decomposition done by mdrun in a first 
approach is optimal for your system.


You should also try to use cpu multiplicity and power of two for maximal 
performance,
in your case, probably something like -nt 12, -nt 24 or -nt 36, since 
each cpu seems to be a 12-core ...


See their respective manuals and command line helps for more info.

Best,

Stéphane

--
Team Protein Design In Silico
UFIP, UMR 6286 CNRS,
UFR Sciences et Techniques,
2, rue de la Houssinière, Bât. 25,
44322 Nantes cedex 03, France
Tél : +33 251 125 636
Fax : +33 251 125 632
http://www.ufip.univ-nantes.fr/ - http://www.steletch.org

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] Regarding Gromacs 5.0.3 parallel computation

2014-12-09 Thread Bikash Ranjan Sahoo
​Dear All,
I have installed the Gromacs 5.0.3 ​in cluster and would like to thank
Dr. Mark for his valuable suggestions and guidance. I am facing some
problems in the computation speed in 5.0.3. A comparable study of the same
system in gromacs 4.5.5 and 5.0.3 in the same cluster using equal number of
nodes rendered an extremely slow simulation for the latter one. The
commands I used for installation are pasted below.


cmake .. -DCMAKE_INSTALL_PREFIX=/user1/GROMACS-5.0.3 -DGMX_MPI=ON
-DGMX_THREAD_MPI=ON -DGMX_PREFER_STATIC_LIBS=ON -DGMX_BUILD_OWN_FFTW=ON
-DGMX_X11=OFF -DGMX_CPU_ACCELERATION=SSE4.1

​I tried to do a small simulation in Gromacs 4.5.5 using 30 cores for 200
ps. The computation time was 4.56 minutes . The command used was dplace -c
0-29 mdrun -v -s md.tpr -c md.gro -nt 30 .

Next I ran the same system using Gromacs 5.0.3. The command used was
dplace -c 0-29 mpirun -np 30 mdrun_mpi -v -s md.tpr -c md.gro. The
simulation was extremely slow and took 37 minutes to complete only 200 ps
MD.
Even the energy minimization for a small protein is taking long time in
5.0.3 which can be converged in 4.5.5 in few seconds. Kindly suggest me
where is the problem. Is there any problem in my installation procedure (in
cmake commands).

Thanking You
In anticipation of your reply.
Bikash, Osaka, Japan


​P.S. The dplace 0-29​ is for serial assignment of CPUs in my cluster.
Kindly ignore it if you are using qsub.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.