Re: [OMPI users] new map-by-obj has a problem

2014-03-02 Thread tmishima
Hi Ralph, I have tested your fix - 30895. I'm afraid to say I found a mistake. You should include "SETTING BIND_TO_NONE" in the above if-clause at the line 74, 256, 511, 656. Othrewise, just warning message disappears but binding to core is still overwritten by binding to none. Pleaes see

Re: [OMPI users] Compiling Open MPI 1.7.4 using PGI 14.2 and Mellanox HCOLL enabled

2014-03-02 Thread Filippo Spiga
Dear Ralph, I still need a workaround to compile using PGI and --with-hcoll. I tried a night snapshot last week I will try again the latest one and if something change I will let you know. Regards, Filippo On Feb 26, 2014, at 6:16 PM, Ralph Castain wrote: > Perhaps you

Re: [OMPI users] Heterogeneous cluster problem - mixing AMD and Intel nodes

2014-03-02 Thread Victor
Thanks for your reply. There are some updates, but it was too late last night to post it. I now have the AMD/Intel heterogeneous cluster up and running. The initial problem was that when I installed OpenMPI on the AMD nodes, the library paths were set to a different location than on the Intel

Re: [OMPI users] OpenMPI job initializing problem

2014-03-02 Thread Jeff Squyres (jsquyres)
Both 1.6.x and 1.7.x/1.8.x will need verbs.h to use the native verbs network stack. You can use emulated TCP over IB (e.g., using the OMPI TCP BTL), but it's nowhere near as fast/efficient the native verbs network stack. On Mar 2, 2014, at 10:47 AM, Beichuan Yan

Re: [OMPI users] OpenMPI job initializing problem

2014-03-02 Thread Jeff Squyres (jsquyres)
On Mar 2, 2014, at 10:18 AM, Gustavo Correa wrote: > Make sure you have any ofed/openib "devel" packages installed, > in case they exist and yum lists them. > This may be a possible reason for missing header files. +1 Look for libibverbs-devel. -- Jeff Squyres

Re: [OMPI users] OpenMPI job initializing problem

2014-03-02 Thread Gustavo Correa
FWIW, I never added any verbs or openib switches to the OMPI configure command line, and configure found them and built with Infiniband support. Up to OMPI 1.6.5 (I didn't try the 1.7 series). Unless you are installing your ofed/IB packages in non-standard places, I guess configure will find

Re: [OMPI users] OpenMPI job initializing problem

2014-03-02 Thread Jeff Squyres (jsquyres)
FWIW: /usr/include/infiniband/verbs.h is the normal location for verbs.h. Don't add --with-verbs=/usr/include/infinband; it won't work. Please send all the information listed here and we can have a look at your logs: http://www.open-mpi.org/community/help/ On Mar 2, 2014, at 7:16 AM,

Re: [OMPI users] OpenMPI job initializing problem

2014-03-02 Thread Ralph Castain
It should have been looking in the same place - check to see where you installed the inifiniband support. Is "verbs.h" under your /usr/include? In looking at the code, the 1.6 series searched for verbs.h in /usr/include/infiniband. The 1.7 series also does (though it doesn't look quite right

Re: [OMPI users] OpenMPI job initializing problem

2014-03-02 Thread Beichuan Yan
Ralph and Gus, 1. Thank you for your suggestion. I built Open MPI 1.6.5 with the following command: ./configure --prefix=/work4/projects/openmpi/openmpi-1.6.5-gcc-compilers-4.7.3 --with-tm=/opt/pbs/default --with-openib= --with-openib-libdir=/usr/lib64 In my job script, I need to specify the

Re: [OMPI users] Heterogeneous cluster problem - mixing AMD and Intel nodes

2014-03-02 Thread Brice Goglin
What's your mpirun or mpiexec command-line? The error "BTLs attempted: self sm tcp" says that it didn't even try the MX BTL (for Open-MX). Did you use the MX MTL instead? Are you sure that you actually use Open-MX when not mixing AMD and Intel nodes? Brice Le 02/03/2014 08:06, Victor a écrit :

[OMPI users] Heterogeneous cluster problem - mixing AMD and Intel nodes

2014-03-02 Thread Victor
I got 4 x AMD A-10 6800K nodes on loan for a few months and added them to my existing Intel nodes. All nodes share the relevant directories via NFS. I have OpenMPI 1.6.5 which was build with Open-MX 1.5.3 support networked via GbE. All nodes run Ubuntu 12.04. Problem: I can run a job EITHER on