It should have been looking in the same place - check to see where you 
installed the inifiniband support. Is "verbs.h" under your /usr/include?

In looking at the code, the 1.6 series searched for verbs.h in 
/usr/include/infiniband. The 1.7 series also does (though it doesn't look quite 
right to me), but it wouldn't hurt to add it yourself

--with-verbs=/usr/include/infiniband --with-verbs-libdir=/usr/lib64/infiniband

or something like that


On Mar 1, 2014, at 11:56 PM, Beichuan Yan <beichuan....@colorado.edu> wrote:

> Ralph and Gus,
> 
> 1. Thank you for your suggestion. I built Open MPI 1.6.5 with the following 
> command: 
> ./configure 
> --prefix=/work4/projects/openmpi/openmpi-1.6.5-gcc-compilers-4.7.3 
> --with-tm=/opt/pbs/default --with-openib=  --with-openib-libdir=/usr/lib64
> 
> In my job script, I need to specify the IB subnet like this:
> TCP="--mca btl_tcp_if_include 10.148.0.0/16"
> mpirun $TCP -np 64 -hostfile $PBS_NODEFILE ./paraEllip3d input.txt
> 
> Then my job can get initialized and run correctly each time!
> 
> 2. However, to build Open MPI 1.7.4 with another command (in order to 
> test/compare shared-memory performance of Open MPI):
> ./configure 
> --prefix=/work4/projects/openmpi/openmpi-1.7.4-gcc-compilers-4.7.3 
> --with-tm=/opt/pbs/default --with-verbs=  --with-verbs-libdir=/usr/lib64
> 
> It gets error as follows:
> ============================================================================
> == Modular Component Architecture (MCA) setup
> ============================================================================
> checking for subdir args...  
> '--prefix=/work4/projects/openmpi/openmpi-1.7.4-gcc-compilers-4.7.3' 
> '--with-tm=/opt/pbs/default' '--with-verbs=' '--with-verbs-libdir=/usr/lib64' 
> 'CC=gcc' 'CXX=g++'
> checking --with-verbs value... simple ok (unspecified)
> checking --with-verbs-libdir value... sanity check ok (/usr/lib64)
> configure: WARNING: Could not find verbs.h in the usual locations under
> configure: error: Cannot continue
> 
> Our system is Red Hat 6.4. Do we need to install more packages of Infiniband? 
> Can you please advise?
> 
> Thanks,
> Beichuan Yan
> 
> 
> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa
> Sent: Friday, February 28, 2014 15:59
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI job initializing problem
> 
> HI Beichuan
> 
> To add to what Ralph said,
> the RHEL OpenMPI package probably wasn't built with with PBS Pro support 
> either.
> Besides, OMPI 1.5.4 (RHEL version) is old.
> 
> **
> 
> You will save yourself time and grief if you read the installation FAQs, 
> before you install from the source tarball:
> 
> http://www.open-mpi.org/faq/?category=building
> 
> However, as Ralph said, that is your best bet, and it is quite easy to get 
> right.
> 
> 
> See this FAQ on how to build with PBS Pro support:
> 
> http://www.open-mpi.org/faq/?category=building#build-rte-tm
> 
> And this one on how to build with Infiniband support:
> 
> http://www.open-mpi.org/faq/?category=building#build-p2p
> 
> Here is how to select the installation directory (--prefix):
> 
> http://www.open-mpi.org/faq/?category=building#easy-build
> 
> Here is how to select the compilers (gcc,g++, and gfortran are fine):
> 
> http://www.open-mpi.org/faq/?category=building#build-compilers
> 
> I hope this helps,
> Gus Correa
> 
> On 02/28/2014 12:36 PM, Ralph Castain wrote:
>> Almost certainly, the redhat package wasn't built with matching 
>> infiniband support and so we aren't picking it up. I'd suggest 
>> downloading the latest 1.7.4 or 1.7.5 nightly tarball, or even the 
>> latest 1.6 tarball if you want the stable release, and build it 
>> yourself so you *know* it was built for your system.
>> 
>> 
>> On Feb 28, 2014, at 9:20 AM, Beichuan Yan <beichuan....@colorado.edu 
>> <mailto:beichuan....@colorado.edu>> wrote:
>> 
>>> Hi there,
>>> I am running jobs on clusters with Infiniband connection. They 
>>> installed OpenMPI v1.5.4 via REDHAT 6 yum package). My problem is 
>>> that although my jobs gets queued and started by PBS PRO quickly, 
>>> most of the time they don't really run (occasionally they really run) 
>>> and give error info like this (even though there are a lot of CPU/IB 
>>> resource
>>> available):
>>> [r2i6n7][[25564,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_com
>>> plete_connect]
>>> connect() to 192.168.159.156 failed: Connection refused (111) And 
>>> even though when a job gets started and runs well, it prompts this
>>> error:
>>> ---------------------------------------------------------------------
>>> -----
>>> WARNING: There was an error initializing an OpenFabrics device.
>>> Local host: r1i2n6
>>> Local device: mlx4_0
>>> ---------------------------------------------------------------------
>>> ----- 1. Here is the info from one of the compute nodes:
>>> -bash-4.1$ /sbin/ifconfig
>>> eth0 Link encap:Ethernet HWaddr 8C:89:A5:E3:D2:96 inet 
>>> addr:192.168.159.205 Bcast:192.168.159.255 Mask:255.255.255.0
>>> inet6 addr: fe80::8e89:a5ff:fee3:d296/64 Scope:Link UP BROADCAST 
>>> RUNNING MULTICAST MTU:1500 Metric:1 RX packets:48879864 errors:0 
>>> dropped:0 overruns:17 frame:0 TX packets:39286060 errors:0 dropped:0 
>>> overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:54771093645 (51.0 GiB) TX bytes:37512462596 (34.9 GiB)
>>> Memory:dfc00000-dfc20000
>>> Ifconfig uses the ioctl access method to get the full address 
>>> information, which limits hardware addresses to 8 bytes.
>>> Because Infiniband address has 20 bytes, only the first 8 bytes are 
>>> displayed correctly.
>>> Ifconfig is obsolete! For replacement check ip.
>>> ib0 Link encap:InfiniBand HWaddr
>>> 80:00:00:48:FE:C0:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>>> inet addr:10.148.0.114 Bcast:10.148.255.255 Mask:255.255.0.0
>>> inet6 addr: fe80::202:c903:fb:3489/64 Scope:Link UP BROADCAST RUNNING 
>>> MULTICAST MTU:65520 Metric:1 RX packets:43807414 errors:0 dropped:0 
>>> overruns:0 frame:0 TX packets:10534050 errors:0 dropped:24 overruns:0 
>>> carrier:0
>>> collisions:0 txqueuelen:256
>>> RX bytes:47824448125 (44.5 GiB) TX bytes:44764010514 (41.6 GiB) lo 
>>> Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0
>>> inet6 addr: ::1/128 Scope:Host
>>> UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:17292 errors:0 
>>> dropped:0 overruns:0 frame:0 TX packets:17292 errors:0 dropped:0 
>>> overruns:0 carrier:0
>>> collisions:0 txqueuelen:0
>>> RX bytes:1492453 (1.4 MiB) TX bytes:1492453 (1.4 MiB) -bash-4.1$ 
>>> chkconfig --list iptables iptables 0:off 1:off 2:on 3:on 4:on 5:on 
>>> 6:off 2. I tried various parameters below but none of them can assure 
>>> my jobs get initialized and run:
>>> #TCP="--mca btl ^tcp"
>>> #TCP="--mca btl self,openib"
>>> #TCP="--mca btl_tcp_if_exclude lo"
>>> #TCP="--mca btl_tcp_if_include eth0"
>>> #TCP="--mca btl_tcp_if_include eth0, ib0"
>>> #TCP="--mca btl_tcp_if_exclude 192.168.0.0/24,127.0.0.1/8 --mca 
>>> oob_tcp_if_exclude 192.168.0.0/24,127.0.0.1/8"
>>> #TCP="--mca btl_tcp_if_include 10.148.0.0/16"
>>> mpirun $TCP -hostfile $PBS_NODEFILE -np 8 ./paraEllip3d input.txt 3. 
>>> Then I turned to Intel MPI, which surprisingly starts and runs my job 
>>> correctly each time (though it is a little slower than OpenMPI, maybe 
>>> 15% slower, but it works each time).
>>> Can you please advise? Many thanks.
>>> Sincerely,
>>> Beichuan Yan
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org> 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to