Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-15 Thread Jeff Squyres
Excellent! We developers have talked about creating an FAQ entry for running at large scale for a long time, but have never gotten a round tuit. I finally filed a ticket to do this (https://svn.open-mpi.org/trac/ompi/ticket/1503 ) -- these pending documentation tickets will likely be

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-15 Thread Prasanna Ranganathan
Hi, I am happy to state that I believe I have finally found the fix for the No route to host error The solution was to increase the ARP cache in the head node and also add a few static ARP entries. The cache was running out sometime during the program execution leading to connection

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-15 Thread Eric Thibodeau
Simply to keep track of what's going on: I checked the build environment for openmpi and the system's setting, they were built using gcc 3.4.4 with -Os, which was reputed unstable and problematic with this compiler version. I've asked Prasanna to rebuild using -O2 but this could be a bit

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-12 Thread Eric Thibodeau
Prasanna, Please send me your /etc/make.conf and the contents of /var/db/pkg/sys-cluster/openmpi-1.2.7/ You can package this with the following command line: tar -cjf data.tbz /etc/make.conf /var/db/pkg/sys-cluster/openmpi-1.2.7/ And simply send me the data.tbz file. Thanks, Eric

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-12 Thread Prasanna Ranganathan
Hi, I did make sure at the beginning that only eth0 was activated on all the nodes. Nevertheless, I am currently verifying the NIC configuration on all the nodes and making sure things are as expected. While trying different things, I did come across this peculiar error which I had detailed in

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-12 Thread Matt Hughes
Hi Prasanna, do you have any unusual ethernet interfaces on your nodes? I have seen similar problems when using IP over Infiniband. I'm not sure exactly why, but mixing interfaces of different types (ib0 and eth0 for example) can sometimes cause these problems, possibly because they are on

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-12 Thread Prasanna Ranganathan
Hi, I have verified the openMPI version to be 1.2.7 on all the nodes and also ompi_info | grep thread is Thread support: posix (mpi: no, progress: no) on these machines. I get the error with and without -mca oob_tcp_listen_mode listen_thread. Sometimes, the startup takes too long with the

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-12 Thread Jeff Squyres
On Sep 11, 2008, at 6:29 PM, Prasanna Ranganathan wrote: I have tried the following to no avail. On 499 machines running openMPI 1.2.7: mpirun -np 499 -bynode -hostfile nodelist /main/mpiHelloWorld ... With different combinations of the following parameters -mca btl_base_verbose 1 -mca

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Eric Thibodeau
Prasanna, I opened up a bug report to enable a better control over the threading options (http://bugs.gentoo.org/show_bug.cgi?id=237435). In the meanwhile, if your helloWorld isn't too fluffy, could you send it over (off list if you prefer) so I can take a look at it, the Segmentation

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Eric Thibodeau
Jeff Squyres wrote: On Sep 11, 2008, at 3:27 PM, Eric Thibodeau wrote: Ok, added to the information from the README, I'm thinking none of the 3 configure options have an impact on the said 'threaded TCP listener' and the MCA option you suggested should still work, is this correct? It

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Jeff Squyres
On Sep 11, 2008, at 3:27 PM, Eric Thibodeau wrote: Ok, added to the information from the README, I'm thinking none of the 3 configure options have an impact on the said 'threaded TCP listener' and the MCA option you suggested should still work, is this correct? It should default to

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Eric Thibodeau
Jeff Squyres wrote: On Sep 11, 2008, at 2:38 PM, Eric Thibodeau wrote: In short: Which of the 3 options is the one known to be unstable in the following: --enable-mpi-threadsEnable threads for MPI applications (default: disabled) --enable-progress-threads

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Jeff Squyres
On Sep 11, 2008, at 2:38 PM, Eric Thibodeau wrote: In short: Which of the 3 options is the one known to be unstable in the following: --enable-mpi-threadsEnable threads for MPI applications (default: disabled) --enable-progress-threads

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Ralph Castain
The two configuration options that are disabled by default (--enable- mpi-threads and --enable-progress-threads) are both known unstable The runtime listen_thread option is quite different and is known safe. Ralph On Sep 11, 2008, at 12:38 PM, Eric Thibodeau wrote: Jeff, In short: Which

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Eric Thibodeau
Jeff, In short: Which of the 3 options is the one known to be unstable in the following: --enable-mpi-threadsEnable threads for MPI applications (default: disabled) --enable-progress-threads Enable threads asynchronous communication

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Eric Thibodeau
Jeff Squyres wrote: I'm not sure what USE=-threads means, but I would discourage the use of threads in the v1.2 series; our thread support is pretty much broken in the 1.2 series. That's exactly what it means, hence the following BFW I had originally inserted in the package to this effect:

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Eric Thibodeau
Prasanna Ranganathan wrote: Hi Eric, Thanks a lot for the reply. I am currently working on upgrading to 1.2.7 I do not quite follow your directions; What do you refer to when you say say "try with USE=-threads..." I am referring to the USE variable which is used to set global package

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Prasanna Ranganathan
Hi, I have upgraded to 1.2.7 and am still noticing the issue. Kindly help. > > Message: 1 > Date: Mon, 8 Sep 2008 16:43:33 -0400 > From: Jeff Squyres > Subject: Re: [OMPI users] Need help resolving No route to host error > withOpenMPI 1.1.2 > To: Open MPI Users

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Prasanna Ranganathan
Hi Eric, Thanks a lot for the reply. I am currently working on upgrading to 1.2.7 I do not quite follow your directions; What do you refer to when you say say "try with USE=-threads..." Kindly excuse if it is a silly question and pardon my ignorance :D Regards, Prasanna.

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Eric Thibodeau
Prasanna, also make sure you try with USE=-threads ...as the ebuild states, it's _experimental_ ;) Keep your eye on: https://svn.open-mpi.org/trac/ompi/wiki/ThreadSafetySupport Eric Prasanna Ranganathan wrote: Hi, I have upgraded my openMPI to 1.2.6 (We have gentoo and emerge showed

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Eric Thibodeau
Prasanna Ranganathan wrote: Hi, I have upgraded my openMPI to 1.2.6 (We have gentoo and emerge showed 1.2.6-r1 to be the latest stable version of openMPI). Prasanna, do a sync, 1.2.7 is in portage and report back. Eric I do still get the following error message when running my test

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Prasanna Ranganathan
Hi, I have upgraded my openMPI to 1.2.6 (We have gentoo and emerge showed 1.2.6-r1 to be the latest stable version of openMPI). I do still get the following error message when running my test helloWorld program: [10.12.77.21][0,1,95][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_c

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-09 Thread Prasanna Ranganathan
Hi Jeff/Paul, Thanks a lot for your replies. I am looking into upgrading MPI to a newer version. As I use a few custom built libraries as part of my main parallel application that recommend the use of 1.1.2, I first need to check compatibility issues with the newer version before I can

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-09 Thread Paul Kapinos
Hi, First, consider to update to newer OpenMPI. Second, look on your environment on the box you startts OpenMPI (runs mpirun ...). Type ulimit -n to explore how many file descriptors your envirinment have. (ulimit -a for all limits). Note, every process on older versions of OpenMPI (prior

[OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-08 Thread Prasanna Ranganathan
Hi, I am trying to run a test mpiHelloWorld program that simply initializes the MPI environment on all the nodes, prints the hostname and rank of each node in the MPI process group and exits. I am using MPI 1.1.2 and am running 997 processes on 499 nodes (Nodes have 2 dual core CPUs). I get the