Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-27 Thread Ralph Castain
On Apr 27, 2011, at 1:31 PM, Sindhi, Waris PW wrote: > No we do not have a firewall turned on. I can run smaller 96 slave cases > on ln10 and ln13 included on the slavelist. > > Could there be another reason for this to fail ? What is in "procgroup"? Is it a single application? Offhand,

Re: [OMPI users] Need help buiding OpenMPI with Intel v12.0 compilers on Linux

2011-04-27 Thread Tru Huynh
On Thu, Apr 28, 2011 at 12:46:27AM +0200, Tru Huynh wrote: > On Thu, Apr 21, 2011 at 06:35:16PM -0400, Jeff Squyres wrote: > > It's normal and expected for there to be lots of errors in config.log. > > > > There's a bunch of tests in configure that are designed to succeed on some > > systems

Re: [OMPI users] Need help buiding OpenMPI with Intel v12.0 compilers on Linux

2011-04-27 Thread Tru Huynh
On Thu, Apr 21, 2011 at 06:35:16PM -0400, Jeff Squyres wrote: > It's normal and expected for there to be lots of errors in config.log. > > There's a bunch of tests in configure that are designed to succeed on some > systems and fail on others. > > So don't read anything into the failures

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Jeff Squyres
On Apr 27, 2011, at 3:39 PM, Ralph Castain wrote: > Nope, nope nope...in this mode of operation, we are using -static- ports. Er.. right. Sorry -- my bad for not reading the full context here... ignore what I said... -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to:

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Ralph Castain
On Apr 27, 2011, at 1:27 PM, Jeff Squyres wrote: > On Apr 27, 2011, at 2:46 PM, Ralph Castain wrote: > >> Actually, I understood you correctly. I'm just saying that I find no >> evidence in the code that we try three times before giving up. What I see is >> a single attempt to bind the port -

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-27 Thread Sindhi, Waris PW
No we do not have a firewall turned on. I can run smaller 96 slave cases on ln10 and ln13 included on the slavelist. Could there be another reason for this to fail ? Sincerely, Waris Sindhi High Performance Computing, TechApps Pratt & Whitney, UTC (860)-565-8486 -Original Message-

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Jeff Squyres
On Apr 27, 2011, at 2:46 PM, Ralph Castain wrote: > Actually, I understood you correctly. I'm just saying that I find no evidence > in the code that we try three times before giving up. What I see is a single > attempt to bind the port - if it fails, then we abort. There is no parameter > to

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote: > > On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote: >>> >>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: >>> Was this

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Ralph Castain
On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: > On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote: >> >> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: >> >>> Was this ever committed to the OMPI src as something not having to be >>> run outside of

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote: > > On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: > >> Was this ever committed to the OMPI src as something not having to be >> run outside of OpenMPI, but as part of the PSM setup that OpenMPI >> does? > > Not

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Ralph Castain
On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: > Was this ever committed to the OMPI src as something not having to be > run outside of OpenMPI, but as part of the PSM setup that OpenMPI > does? Not that I know of - I don't think the PSM developers ever looked at it. > > I'm having

[OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-27 Thread Sindhi, Waris PW
Hi, I am getting a "oob-tcp: Communication retries exceeded" error message when I run a 238 MPI slave code /opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp --mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix /usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
Was this ever committed to the OMPI src as something not having to be run outside of OpenMPI, but as part of the PSM setup that OpenMPI does? I'm having some trouble getting Slurm/OpenMPI to play nice with the setup of this key. Namely, with slurm you cannot export variables from the --prolog of

Re: [OMPI users] RES: RES: RES: Error with ARM target

2011-04-27 Thread Jeff Squyres
FWIW, my ARM contact tells me that he uses a native ARM Linux distro explicitly to avoid all the complexities of cross-compiling... :-\ On Apr 25, 2011, at 11:29 AM, Jeff Squyres wrote: > There's some extra special mojo that needs to be supplied when > cross-compiling Open MPI (e.g., a file

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-27 Thread Brock Palen
Argh, our messed up environment with three generations on infiniband bit us, Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR ib on some of our hosts. Note that jobs will never run across our old DDR ib and our new QDR stuff where rdmacm does work. I am doing some

[OMPI users] [SPAM:### 84%]

2011-04-27 Thread christophe petit
http://www.pimp2.com/modules/mod_osdonate/life.html