Re: [OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel

2013-02-21 Thread Stefan Friedel

On Thu, Feb 21, 2013 at 12:23:14PM +0100, Paul Kapinos wrote:
The MTT-Parameter mess is well-known and the good solution is to set 
the MTT parameter high. In other case you never know what you will 
get - your application may hang, block the IB interface, run bit 
slower, run very slow...


http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem


As I wrote: I'm aware of this FAQ entries -but: you can't set the log_num_mtt
parameter if you're using a Debian/vanilla kernel: the mlx4_core-module
does not offer this parameter.

MfG/Sincerely,
Stefan Friedel
--
IWR * 523 * INF 368 * 69120 Heidelberg
T +49 6221 548240 * F +49 6221 545224
stefan.frie...@iwr.uni-heidelberg.de


signature.asc
Description: Digital signature


Re: [OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel

2013-02-21 Thread Paul Kapinos
The MTT-Parameter mess is well-known and the good solution is to set the MTT 
parameter high. In other case you never know what you will get - your 
application may hang, block the IB interface, run bit slower, run very slow...


http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
http://www.open-mpi.org/community/lists/devel/2012/08/11417.php
http://montecarlo.vtt.fi/mtg/2012_Madrid/Hans_Hammer2.pdf

On 02/21/13 11:53, Stefan Friedel wrote:

Is there a way to tell openmpi-1.6.3 to use the ofed-module from vanilla
kernel and not to rely on log_num_mtt for
"do-we-have-enough-registred-mem" computation for Mellanox HCAs? Any
other idea/hint?



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel

2013-02-21 Thread Stefan Friedel

Good morning,
I'm struggling with the setup of openmpi-1.6.3 on top of Debian
wheezy/testing and mellanox/ofed/mlx4 memory pinning- cluster equipped
with Mellanox HCAs  MT26428, Debian 3.2.35-2 x86_64, 4x8core AMD Opteron
6212, 128G Memory.

I'm aware of the FAQ entries about mlx4_core module parameters
(log_num_mtt etc.) but the module in Debian kernels (resp. kernels from
kernel.org up to recent 3.8) does not know anything about log_num_mtt.
This parameter is only available in the OFED rpms for SLES/RHEL/OEL.

Jobs started with the the default environment do fail (log_mtts_per_seg
is a valid parameter in mxl4_core/Debian kernel and set to 3;
log_num_mtt is not a valid parameter of mxl4_core and set to 20 in
btl_openib.c, ...Your MPI job will continue, but may be behave poorly
and/or hang..., a simple benchmark will run for hours instead of
returning a result after a few minutes, on the same hardware -Debian
Squeeze and openmpi-1.4.5- this job runs flawlessly)

Is there a way to tell openmpi-1.6.3 to use the ofed-module from vanilla
kernel and not to rely on log_num_mtt for
"do-we-have-enough-registred-mem" computation for Mellanox HCAs? Any
other idea/hint?

MfG/Sincerely,
Stefan Friedel
--
IWR * 523 * INF 368 * 69120 Heidelberg
T +49 6221 548240 * F +49 6221 545224
stefan.frie...@iwr.uni-heidelberg.de


signature.asc
Description: Digital signature