Re: [OMPI users] srun and openmpi

2011-04-28 Thread Ralph Castain
On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: > On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote: >> >> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: >> >>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote: On Apr

[OMPI users] MPI_Comm_create prevents external socket connections

2011-04-28 Thread Randolph Pullen
I have a problem with MPI_Comm_create, My server application has 2 processes per node, 1 listener and 1 worker. Each listener monitors a specified port for incoming TCP connections with the goal that on receipt of a request it will distribute it over the workers in a SIMD

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread jody
Hi Ralph Is there an easy way i could modify the OpenMPI code so that it would use the -Y option for ssh when connecting to remote machines? Thank You Jody On Thu, Apr 7, 2011 at 4:01 PM, jody wrote: > Hi Ralph > thank you for your suggestions. After some fiddling, i

[OMPI users] --enable-progress-threads broken in 1.5.3?

2011-04-28 Thread Paul Kapinos
Hi OpenMPI folks, I've tried to install /1.5.3 version with aktivated progress threads (just to try it out) in addition to --enable-mpi-threads. The installation was fine, I also could build binaries, but each mpiexec call hangs forever silently. With the very same configuration options but

Re: [OMPI users] --enable-progress-threads broken in 1.5.3?

2011-04-28 Thread Jeff Squyres
It is quite likely that --enable-progress-threads is broken. I think it's even disabled in 1.4.x; I wonder if we should do the same in 1.5.x... On Apr 28, 2011, at 5:20 AM, Paul Kapinos wrote: > Hi OpenMPI folks, > > I've tried to install /1.5.3 version with aktivated progress threads (just

Re: [OMPI users] MPI_Comm_create prevents external socket connections

2011-04-28 Thread Jeff Squyres
MPI_Comm_create shouldn't have any effect on existing fd's. Have you run your code through a memory-checking debugger such as valgrind? On Apr 28, 2011, at 12:57 AM, Randolph Pullen wrote: > I have a problem with MPI_Comm_create, > > My server application has 2 processes per node, 1 listener

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Jeff Squyres
I do note that you are using an ancient version of Open MPI (1.2.8). Is there any way you can upgrade to a (much) later version, such as 1.4.3? That might improve your TCP connectivity -- we made improvements in those portions of the code over the years. On Apr 27, 2011, at 8:09 PM, Ralph

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-28 Thread Jeff Squyres
On Apr 27, 2011, at 10:02 AM, Brock Palen wrote: > Argh, our messed up environment with three generations on infiniband bit us, > Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR > ib on some of our hosts. Note that jobs will never run across our old DDR ib > and

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Ralph Castain
On Apr 28, 2011, at 6:04 AM, Jeff Squyres wrote: > I do note that you are using an ancient version of Open MPI (1.2.8). I don't think that is accurate - at least, the output doesn't match that old a version. The process name format is indicative of something 1.3 or more recent. What lead you

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Jeff Squyres
On Apr 28, 2011, at 8:45 AM, Ralph Castain wrote: > What lead you to conclude 1.2.8? > > /opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp > --mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix > /usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app procgroup

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote: > > On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote: >>> >>> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: >>> On Wed,

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Sindhi, Waris PW
Yes the procgroup file has more than 128 applications in it. % wc -l procgroup 239 procgroup Is 128 the max applications that can be in a procgroup file ? Sincerely, Waris Sindhi High Performance Computing, TechApps Pratt & Whitney, UTC (860)-565-8486 -Original Message- From:

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Ralph Castain
On Apr 28, 2011, at 6:49 AM, Jeff Squyres wrote: > On Apr 28, 2011, at 8:45 AM, Ralph Castain wrote: > >> What lead you to conclude 1.2.8? >> >> /opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp >> --mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Ralph Castain
On Apr 28, 2011, at 6:56 AM, Sindhi, Waris PW wrote: > Yes the procgroup file has more than 128 applications in it. > > % wc -l procgroup > 239 procgroup > > Is 128 the max applications that can be in a procgroup file ? Yep - this limitation is lifted in the developer's trunk, but not yet

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Ralph Castain
On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote: > On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote: >> >> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: >> >>> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote: On Apr

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread Ralph Castain
Should be able to just set -mca plm_rsh_agent "ssh -Y" on your cmd line, I believe On Apr 28, 2011, at 12:53 AM, jody wrote: > Hi Ralph > > Is there an easy way i could modify the OpenMPI code so that it would use > the -Y option for ssh when connecting to remote machines? > > Thank You >

Re: [OMPI users] --enable-progress-threads broken in 1.5.3?

2011-04-28 Thread Eugene Loh
CMR 2728 did this. I think the changes are in 1.5.4. On 4/28/2011 5:00 AM, Jeff Squyres wrote: It is quite likely that --enable-progress-threads is broken. I think it's even disabled in 1.4.x; I wonder if we should do the same in 1.5.x...

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-28 Thread Brock Palen
Attached is the output of running with verbose 100, mpirun --mca btl_openib_cpc_include rdmacm --mca btl_base_verbose 100 NPmpi [nyx0665.engin.umich.edu:06399] mca: base: components_open: Looking for btl components [nyx0666.engin.umich.edu:07210] mca: base: components_open: Looking for btl

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread jody
Hi Unfortunately this does not solve my problem. While i can do ssh -Y squid_0 xterm and this will open an xterm on m,y machiine (chefli), i run into problems with the -xterm option of openmpi: jody@chefli ~/share/neander $ mpirun -np 4 -mca plm_rsh_agent "ssh -Y" -host squid_0 --xterm 1

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Ralph Castain
Per earlier in the thread, it looks like you are using a 1.5 series release - so here is a patch that -should- fix the PSM setup problem. Please let me know if/how it works as I honestly have no way of testing it. Ralph slurmd.diff Description: Binary data On Apr 28, 2011, at 7:03 AM, Ralph

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread Ralph Castain
No immediate suggestions - I won't get a chance to test this until later as I don't normally run an x11 server on my box, and don't have another way to test it. On Apr 28, 2011, at 8:38 AM, jody wrote: > Hi > > Unfortunately this does not solve my problem. > While i can do > ssh -Y squid_0

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Sindhi, Waris PW
The --prefix directory is a typo and no longer exists on our system. We are running 1.4-4 version of OpenMPI % /opt/openmpi/x86_64/bin/ompi_info Package: Open MPI mockbu...@x86-004.build.bos.redhat.com Distribution Open MPI: 1.4 Open MPI SVN revision: r22285

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Ralph Castain
We figured out that in the case where you provide the full path to mpirun -and- the -prefix option, we ignore the latter anyway. :-/ I'm working on a patch to at least warn you we are ignoring it. On Apr 28, 2011, at 2:03 PM, Sindhi, Waris PW wrote: > The --prefix directory is a typo and no

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Michael Di Domenico
On Thu, Apr 28, 2011 at 9:03 AM, Ralph Castain wrote: > > On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote: >>> >>> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: >>> On Wed, Apr

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Sindhi, Waris PW
Do you know when this fix is slated for an official release ? Sincerely, Waris Sindhi High Performance Computing, TechApps Pratt & Whitney, UTC (860)-565-8486 -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent:

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread Ralph Castain
Hi Jody I'm not sure when I'll get a chance to work on this - got a deadline to meet. I do have a couple of suggestions, if you wouldn't mind helping debug the problem? It looks to me like the problem is that mpirun is crashing or terminating early for some reason - hence the failures to send

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread jody
Hi Ralph Thank you for your suggestions. I'll be happy to help you. I'm not sure if i'll get around to this tomorrow, but i certainly will do so on Monday. Thanks Jody On Thu, Apr 28, 2011 at 11:53 PM, Ralph Castain wrote: > Hi Jody > > I'm not sure when I'll get a chance