Re: [OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel
On Thu, Feb 21, 2013 at 12:23:14PM +0100, Paul Kapinos wrote: The MTT-Parameter mess is well-known and the good solution is to set the MTT parameter high. In other case you never know what you will get - your application may hang, block the IB interface, run bit slower, run very slow... http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem As I wrote: I'm aware of this FAQ entries -but: you can't set the log_num_mtt parameter if you're using a Debian/vanilla kernel: the mlx4_core-module does not offer this parameter. MfG/Sincerely, Stefan Friedel -- IWR * 523 * INF 368 * 69120 Heidelberg T +49 6221 548240 * F +49 6221 545224 stefan.frie...@iwr.uni-heidelberg.de signature.asc Description: Digital signature
Re: [OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel
The MTT-Parameter mess is well-known and the good solution is to set the MTT parameter high. In other case you never know what you will get - your application may hang, block the IB interface, run bit slower, run very slow... http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem http://www.open-mpi.org/community/lists/devel/2012/08/11417.php http://montecarlo.vtt.fi/mtg/2012_Madrid/Hans_Hammer2.pdf On 02/21/13 11:53, Stefan Friedel wrote: Is there a way to tell openmpi-1.6.3 to use the ofed-module from vanilla kernel and not to rely on log_num_mtt for "do-we-have-enough-registred-mem" computation for Mellanox HCAs? Any other idea/hint? -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
[OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel
Good morning, I'm struggling with the setup of openmpi-1.6.3 on top of Debian wheezy/testing and mellanox/ofed/mlx4 memory pinning- cluster equipped with Mellanox HCAs MT26428, Debian 3.2.35-2 x86_64, 4x8core AMD Opteron 6212, 128G Memory. I'm aware of the FAQ entries about mlx4_core module parameters (log_num_mtt etc.) but the module in Debian kernels (resp. kernels from kernel.org up to recent 3.8) does not know anything about log_num_mtt. This parameter is only available in the OFED rpms for SLES/RHEL/OEL. Jobs started with the the default environment do fail (log_mtts_per_seg is a valid parameter in mxl4_core/Debian kernel and set to 3; log_num_mtt is not a valid parameter of mxl4_core and set to 20 in btl_openib.c, ...Your MPI job will continue, but may be behave poorly and/or hang..., a simple benchmark will run for hours instead of returning a result after a few minutes, on the same hardware -Debian Squeeze and openmpi-1.4.5- this job runs flawlessly) Is there a way to tell openmpi-1.6.3 to use the ofed-module from vanilla kernel and not to rely on log_num_mtt for "do-we-have-enough-registred-mem" computation for Mellanox HCAs? Any other idea/hint? MfG/Sincerely, Stefan Friedel -- IWR * 523 * INF 368 * 69120 Heidelberg T +49 6221 548240 * F +49 6221 545224 stefan.frie...@iwr.uni-heidelberg.de signature.asc Description: Digital signature