Re: [OMPI users] How can I pass my user limits (stacksize, etc.) to mpirun in OpenMPI

2008-09-09 Thread Jeff Squyres
There are several factors that can come into play here.  See this FAQ  
entry about registered memory limits (the same concepts apply to the  
other limits):


http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more


On Sep 9, 2008, at 7:04 PM, Amidu Oloso wrote:

mpirun under OpenMPI is not picking the limit settings from the user  
environment. Is there a way to do this, short of wrapping my  
executable in a script where my limits are set and then invoking  
mpirun on that script?


Thanks.

-Hamid
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] How can I pass my user limits (stacksize, etc.) to mpirun in OpenMPI

2008-09-09 Thread Amidu Oloso
mpirun under OpenMPI is not picking the limit settings from the user 
environment. Is there a way to do this, short of wrapping my executable 
in a script where my limits are set and then invoking mpirun on that script?


Thanks.

-Hamid


Re: [OMPI users] users Digest, Vol 1000, Issue 1

2008-09-09 Thread Jeff Squyres

On Sep 9, 2008, at 3:05 PM, Christopher Tanner wrote:

I think I've found the problem / solution. With Ubuntu, there's a  
program called 'ldconfig' that updates the dynamic linker run-time  
bindings. Since Open MPI was compiled to use dynamic linking, these  
have to be updated. Thus, these commands have to be run on all of  
the nodes


$ sudo ldconfig -v /usr/local/lib
$ sudo ldconfig -v /usr/local/lib/openmpi


Note that you shouldn't need the 2nd of those -- the only things that  
should be in /usr/local/lib/openmpi should be plugins.


FWIW, I do not believe that this is a side effect of the Open MPI  
installation.  The libraries you cited are part of the Intel compiler  
suite, not Open MPI.  The above would work if the Intel libraries are  
also installed in /usr/local/lib.  More specifically, if you had OMPI  
and the Intel compilers installed in different directories, you'd  
either need to run ldconfig on both of them or adjust your  
LD_LIBRARY_PATH to include both.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] users Digest, Vol 1000, Issue 1

2008-09-09 Thread Jeremy Stout
On clusters where I'm using the Intel compilers and OpenMPI, I setup
the compiler directory (usually /opt/intel) as a NFS export. The
computation nodes then mount that export. Next, I add the following
lines to the ld.so.conf file and distribute it to the computation
nodes:
/opt/intel/cce/version_number/lib/em64t
/opt/intel/fce/version_number/lib/em64t

That will depend on the path and version of the compiler set you are
using. Do a '/sbin/ldconfig' on each node and you should be good to
go.

You can also try updating the library path in your submission file.

Jeremy Stout

On Tue, Sep 9, 2008 at 2:58 PM, Christopher Tanner
 wrote:
> Jeremy -
>
> Thanks for the help - this bit of advice came up quite a bit through
> internet searches. However, I made sure that the LD_LIBRARY_PATH was set and
> correct on all nodes -- and the error persists.
>
> Any other possible solutions? Thanks.
>
> ---
> Chris Tanner
> Space Systems Design Lab
> Georgia Institute of Technology
> christopher.tan...@gatech.edu
> ---
>
>
>
> On Sep 9, 2008, at 12:00 PM, users-requ...@open-mpi.org wrote:
>
>>
>> The library you specified in your post (libimf.so) is part of the
>> Intel Compiler Suite (fce and cce). You'll need to make those
>> libraries available to your computation nodes and update the
>> LD_LIBRARY_PATH accordingly.
>>
>> Jeremy Stout
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] users Digest, Vol 1000, Issue 1

2008-09-09 Thread Christopher Tanner

Jeremy -

I think I've found the problem / solution. With Ubuntu, there's a  
program called 'ldconfig' that updates the dynamic linker run-time  
bindings. Since Open MPI was compiled to use dynamic linking, these  
have to be updated. Thus, these commands have to be run on all of the  
nodes


$ sudo ldconfig -v /usr/local/lib
$ sudo ldconfig -v /usr/local/lib/openmpi

When installing from an RPM (in RedHat) or installing for a dpkg (in  
Debain), this linking is done automatically at the end of the install.  
However, if you compile from source, you have to link it manually.


Now Open MPI runs fine. :)

---
Chris Tanner
Space Systems Design Lab
Georgia Institute of Technology
christopher.tan...@gatech.edu
---

The library you specified in your post (libimf.so) is part of the
Intel Compiler Suite (fce and cce). You'll need to make those
libraries available to your computation nodes and update the
LD_LIBRARY_PATH accordingly.

Jeremy Stout


Re: [OMPI users] users Digest, Vol 1000, Issue 1

2008-09-09 Thread Jeff Squyres
You might want to double check this; it's an easy thing to test  
incorrectly.


What you want to check is that the LD_LIBRARY_PATH is set properly for  
*non-interactive logins* (I assume you are using the rsh/ssh launcher  
for Open MPI, vs. using a resource manager such as SLURM, Torque,  
etc.).  For example, try this:


-
shell$ ssh othernode env | grep LD_LIBRARY_PATH
-

This runs "env" on the other node and will show you what the  
LD_LIBRARY_PATH is over there.   This is what you want to check  
includes the right paths for the Intel libraries.  Note that it is  
different than:


-
shell$ ssh othernode
othernode$ env | grep LD_LIBRARY_PATH
-

Because shell startup files may differentiate between interactive and  
non-interactive logins.  It depends on your local system setup.


Hope that helps.



On Sep 9, 2008, at 2:58 PM, Christopher Tanner wrote:


Jeremy -

Thanks for the help - this bit of advice came up quite a bit through  
internet searches. However, I made sure that the LD_LIBRARY_PATH was  
set and correct on all nodes -- and the error persists.


Any other possible solutions? Thanks.

---
Chris Tanner
Space Systems Design Lab
Georgia Institute of Technology
christopher.tan...@gatech.edu
---



On Sep 9, 2008, at 12:00 PM, users-requ...@open-mpi.org wrote:



The library you specified in your post (libimf.so) is part of the
Intel Compiler Suite (fce and cce). You'll need to make those
libraries available to your computation nodes and update the
LD_LIBRARY_PATH accordingly.

Jeremy Stout


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] users Digest, Vol 1000, Issue 1

2008-09-09 Thread Christopher Tanner

Jeremy -

Thanks for the help - this bit of advice came up quite a bit through  
internet searches. However, I made sure that the LD_LIBRARY_PATH was  
set and correct on all nodes -- and the error persists.


Any other possible solutions? Thanks.

---
Chris Tanner
Space Systems Design Lab
Georgia Institute of Technology
christopher.tan...@gatech.edu
---



On Sep 9, 2008, at 12:00 PM, users-requ...@open-mpi.org wrote:



The library you specified in your post (libimf.so) is part of the
Intel Compiler Suite (fce and cce). You'll need to make those
libraries available to your computation nodes and update the
LD_LIBRARY_PATH accordingly.

Jeremy Stout




Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-09 Thread Prasanna Ranganathan
Hi Jeff/Paul,

 Thanks a lot for your replies.

 I am looking into upgrading MPI to a newer version. As I use a few custom
built libraries as part of my main parallel application that recommend the
use of 1.1.2, I first need to check compatibility issues with the newer
version before I can upgrade.

 Paul,

The following is the output of ulimit -a

core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
file size   (blocks, -f) unlimited
pending signals (-i) 268288
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 8192
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 268288
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


 As I mentioned before, I am able to run it successfully with 997 processes
around 6 out of 10 times with the rest failing. I tried with 500 processes
and I do get the odd failure in that case too.

Regards,

Prasanna.



Re: [OMPI users] libimf.so Error

2008-09-09 Thread Jeremy Stout
On Tue, Sep 9, 2008 at 9:52 AM, Christopher Tanner
 wrote:
> I just installed Open MPI on our cluster and whenever I try to execute a
> process on more than one node, I get this error:
>
> $ mpirun -hostfile $HOSTFILE -n 1 hello_c
> orted: error while loading shared libraries: libimf.so: cannot open shared
> object file: No such file or directory
> ... followed by a whole bunch of timeout errors that I'm assuming were
> caused by the library error above.
>
> The cluster has 16 nodes and is running Ubuntu 8.04 Server. The Open MPI
> source was compiled with openib support using the Intel compilers:
> $ ./configure --prefix=/usr/local --with-openib=/usr/local/lib CC=icc
> CFLAGS=-m64 CXX=icpc CXXFLAGS=-m64 F77=ifort FFLAGS=-m64 FC=ifort
> FCFLAGS=-m64
>
> I've installed the Intel compilers on the master node only, but I've
> installed them in the /usr/local directory, which is accessible to all nodes
> via NFS. Similarly, I've compiled / installed Open MPI only on the master
> node, but in the NFS-shared /usr/local directory as well. Finally, I've
> compiled / installed all of the OpenFabrics libraries on the master node
> only but in the NFS-shared /usr/local/lib directory.
>
> I've run the iccvars.sh and ifortvar.sh scripts on each node to ensure that
> the environment variables were setup for the Intel compilers on each node.
> Additionally, I've modified the LD_LIBRARY_PATH variable on each node to
> include /usr/local/lib and /usr/local/lib/openmpi so that each node can see
> the Infiniband and OpenMPI libraries.
>
> If I only execute Open MPI on the master node, it works fine
> $ mpirun -hostfile $HOSTFILE -n 1 hello_c
> Hello, world, I am 0 of 1
>
> Sorry for the long post and thanks for your help in advance!
>
> ---
> Chris Tanner
> Space Systems Design Lab
> Georgia Institute of Technology
> christopher.tan...@gatech.edu
> ---

The library you specified in your post (libimf.so) is part of the
Intel Compiler Suite (fce and cce). You'll need to make those
libraries available to your computation nodes and update the
LD_LIBRARY_PATH accordingly.

Jeremy Stout


Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-09 Thread Paul Kapinos

Hi,

First, consider to update to newer OpenMPI.

Second, look on your environment on the box you startts OpenMPI (runs 
mpirun ...).


Type
ulimit -n
to explore how many file descriptors your envirinment have. (ulimit -a 
for all limits). Note, every process on older versions of OpenMPI (prior 
1.2.6 inclusively) needs an own file descriptor for each process 
started, IMHO. Maybe its your problem? Does your HelloWorld run OK with 
some 500 processes?


best regards
PK



Prasanna Ranganathan wrote:

Hi,

I am trying to run a test mpiHelloWorld program that simply initializes 
the MPI environment on all the nodes, prints the hostname and rank of 
each node in the MPI process group and exits.


I am using MPI 1.1.2 and am running 997 processes on 499 nodes (Nodes 
have 2 dual core CPUs).


I get the following error messages when I run my program as follows: 
mpirun -np 997 -bynode -hostfile nodelist /main/mpiHelloWorld

.
.
.
[0,1,380][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
[0,1,142][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
[0,1,140][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
[0,1,390][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113
connect() failed with errno=113connect() failed with errno=113connect() 
failed with 
errno=113[0,1,138][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 

connect() failed with 
errno=113[0,1,384][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
[0,1,144][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113
[0,1,388][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with 
errno=113[0,1,386][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113
[0,1,139][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113

connect() failed with errno=113
.
.

*The main thing is that I get these error messages around 3-4 times out 
of 10 attempts with the rest all completing successfully. I have looked 
into the FAQs in detail and also checked the tcp btl settings but am not 
able to figure it out.

*
All the 499 nodes have only eth0 active and I get the error even when I 
run the following: mpirun -np 997 -bynode –hostfile nodelist --mca 
btl_tcp_if_include eth0 /main/mpiHelloWorld


I have attached the output of ompi_info —all.

The following is the output of /sbin/ifconfig on the node where I start 
the mpi process (it is one of the 499 nodes)


eth0  Link encap:Ethernet  HWaddr 00:03:25:44:8F:D6  
  inet addr:10.12.1.11  Bcast:10.12.255.255  Mask:255.255.0.0

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1978724556 errors:17 dropped:0 overruns:0 frame:17
  TX packets:1767028063 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:580938897359 (554026.5 Mb)  TX bytes:689318600552 
(657385.4 Mb)

  Interrupt:22 Base address:0xc000

loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0

  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:70560 errors:0 dropped:0 overruns:0 frame:0
  TX packets:70560 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:339687635 (323.9 Mb)  TX bytes:339687635 (323.9 Mb)


Kindly help.

Regards,

Prasanna.




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


<>

smime.p7s
Description: S/MIME Cryptographic Signature