Re: [OMPI users] How are the Open MPI processes spawned?

2011-12-06 Thread Ralph Castain
I'll take a look at having the rsh launcher forward MCA params up to the cmd 
line limit, and warn if there are too many to fit. Shouldn't be too hard, I 
would think.


On Dec 6, 2011, at 1:28 PM, Paul Kapinos wrote:

> Hello Jeff, Ralph, all!
> 
 Meaning that per my output from above, what Paul was trying should have 
 worked, no?  I.e., setenv'ing OMPI_, and those env vars should 
 magically show up in the launched process.
>>> In the -launched process- yes. However, his problem was that they do not 
>>> show up for the -orteds-, and thus the orteds don't wireup correctly.
> 
> Sorry for latency, too many issues on too many area needing improvement :-/
> Well, just to clarify the long story about what I have seen:
> 
> 1. got a strange start-up problem (based on bogus configuration of eth0 +  
> known (for you, experts :o) bug in /1.5.x
> 
> 2. got a workaround for (1.) by setting  '-mca oob_tcp_if_include ib0 -mca 
> btl_tcp_if_include ib0' on the command line of mpiexec => WORKS! Many thanks 
> guys!
> 
> 3. remembered that any MCA Parameters can also be defined over OMP_MCA_... 
> envvars, tried out to set => works NOT, the hang-ups were again here. 
> Checking how the MCA parameters are set by ompi_info - all clear, but doesn't 
> work. My blind guess was, mpiexec does not understood there envvars in this 
> case.
> See also http://www.open-mpi.org/community/lists/users/2011/11/17823.php
> 
> Thus this issue is not about forwarding some or any OMPI_* envvars to the 
> _processes_, but on someone step _before_ (the processes were not started 
> correctly at all in my problem case), as Ralph wrote.
> 
> The difference in behaviour if setting parameters on command line or over 
> OMPI_*envvars matters!
> 
> 
> Ralph Castain wrote:
> >> Did you filed it, or someone else, or should I do it in some way?
> > I'll take care of it, and copy you on the ticket so you can see
> > what happens. I'll also do the same for the connection bug
> > - sorry for the problem :-(
> 
> Ralph, many thanks for this!
> 
> Best wishes and a nice evening/day/whatever time you have!
> 
> Paul Kapinos
> 
> 
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How are the Open MPI processes spawned?

2011-12-06 Thread Paul Kapinos

Hello Jeff, Ralph, all!


Meaning that per my output from above, what Paul was trying should have worked, no?  
I.e., setenv'ing OMPI_, and those env vars should magically show up 
in the launched process.

In the -launched process- yes. However, his problem was that they do not show 
up for the -orteds-, and thus the orteds don't wireup correctly.


Sorry for latency, too many issues on too many area needing improvement :-/
Well, just to clarify the long story about what I have seen:

1. got a strange start-up problem (based on bogus configuration of eth0 
+  known (for you, experts :o) bug in /1.5.x


2. got a workaround for (1.) by setting  '-mca oob_tcp_if_include ib0 
-mca btl_tcp_if_include ib0' on the command line of mpiexec => WORKS! 
Many thanks guys!


3. remembered that any MCA Parameters can also be defined over 
OMP_MCA_... envvars, tried out to set => works NOT, the hang-ups were 
again here. Checking how the MCA parameters are set by ompi_info - all 
clear, but doesn't work. My blind guess was, mpiexec does not understood 
there envvars in this case.

See also http://www.open-mpi.org/community/lists/users/2011/11/17823.php

Thus this issue is not about forwarding some or any OMPI_* envvars to 
the _processes_, but on someone step _before_ (the processes were not 
started correctly at all in my problem case), as Ralph wrote.


The difference in behaviour if setting parameters on command line or 
over OMPI_*envvars matters!



Ralph Castain wrote:
>> Did you filed it, or someone else, or should I do it in some way?
> I'll take care of it, and copy you on the ticket so you can see
> what happens. I'll also do the same for the connection bug
> - sorry for the problem :-(

Ralph, many thanks for this!

Best wishes and a nice evening/day/whatever time you have!

Paul Kapinos




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-28 Thread Jeff Squyres
On Nov 28, 2011, at 7:39 PM, Ralph Castain wrote:

>> Meaning that per my output from above, what Paul was trying should have 
>> worked, no?  I.e., setenv'ing OMPI_, and those env vars should 
>> magically show up in the launched process.
> 
> In the -launched process- yes. However, his problem was that they do not show 
> up for the -orteds-, and thus the orteds don't wireup correctly.

Now I get it.  Sorry for the noise.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-28 Thread Ralph Castain

On Nov 28, 2011, at 5:32 PM, Jeff Squyres wrote:

> On Nov 28, 2011, at 6:56 PM, Ralph Castain wrote:
> Right-o.  Knew there was something I forgot...
> 
>> So on rsh, we do not put envar mca params onto the orted cmd line. This has 
>> been noted repeatedly on the user and devel lists, so it really has always 
>> been the case.
> 
> So they're sent as part of the launch command (i.e., out of band -- not on 
> the rsh/ssh command line), right?

Yes

> 
> Meaning that per my output from above, what Paul was trying should have 
> worked, no?  I.e., setenv'ing OMPI_, and those env vars should 
> magically show up in the launched process.

In the -launched process- yes. However, his problem was that they do not show 
up for the -orteds-, and thus the orteds don't wireup correctly.






Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-28 Thread Jeff Squyres
On Nov 28, 2011, at 6:56 PM, Ralph Castain wrote:

> I'm afraid that example is incorrect - you were running under slurm on your 
> cluster, not rsh.

Ummm... right.  Duh.

> If you look at the actual code, you will see that we slurp up the envars into 
> the environment of each app_context, and then send that to the backend.

Ah, right -- here's running under rsh (SVN trunk):

-
[16:26] svbu-mpi:~ % cat run
#!/bin/csh -f

echo on `hostname`, foo is: $OMPI_MCA_foo
exit 0
[16:26] svbu-mpi:~ % mpirun -np 2 --host svbu-mpi043,svbu-mpi044 run
OMPI_MCA_foo: Undefined variable.
OMPI_MCA_foo: Undefined variable.
---
While the primary job  terminated normally, 2 processes returned
non-zero exit codes.. Further examination may be required.
---
[16:26] svbu-mpi:~ % setenv OMPI_MCA_foo bar
[16:27] svbu-mpi:~ % mpirun -np 2 --host svbu-mpi043,svbu-mpi044 run
on svbu-mpi044, foo is: bar
on svbu-mpi043, foo is: bar
[16:27] svbu-mpi:~ % 
-

(the "MCA_" here is superfluous -- I looked at the code and the man page that 
Paul cited, and see that we grab all env vars OMPI_*, not OMPI_MCA_*).

> In environments like slurm, we can also apply those envars to the launch of 
> the orteds as we pass the env to the API that launches the orteds. You cannot 
> do that with rsh, as you know.

Right-o.  Knew there was something I forgot...

> So on rsh, we do not put envar mca params onto the orted cmd line. This has 
> been noted repeatedly on the user and devel lists, so it really has always 
> been the case.

So they're sent as part of the launch command (i.e., out of band -- not on the 
rsh/ssh command line), right?

Meaning that per my output from above, what Paul was trying should have worked, 
no?  I.e., setenv'ing OMPI_, and those env vars should magically show 
up in the launched process.

 [after performing some RTFM...]
 at least the man page of mpiexec says, the OMPI_ environment variables are 
 always provided and thus treated *differently* than other envvars:
 
 $ man mpiexec
 
 Exported Environment Variables
All environment variables that are named in the form OMPI_* will  
 automatically  be  exported to new processes on the local and remote nodes.
 
 So, tells the man page lies, or this is an removed feature, or something 
 else?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-28 Thread Ralph Castain
I'm afraid that example is incorrect - you were running under slurm on your 
cluster, not rsh. If you look at the actual code, you will see that we slurp up 
the envars into the environment of each app_context, and then send that to the 
backend.

In environments like slurm, we can also apply those envars to the launch of the 
orteds as we pass the env to the API that launches the orteds. You cannot do 
that with rsh, as you know.

So on rsh, we do not put envar mca params onto the orted cmd line. This has 
been noted repeatedly on the user and devel lists, so it really has always been 
the case.

HTH
Ralph

On Nov 28, 2011, at 3:39 PM, Jeff Squyres wrote:

> (off list)
> 
> Are you sure about OMPI_MCA_* params not being treated specially?  I know for 
> a fact that they *used* to be.  I.e., we bundled up all env variables that 
> began with OMPI_MCA_* and sent them with the job to back-end nodes.  It 
> allowed sysadmins to set global MCA param values without editing the MCA 
> param file on every node.
> 
> It looks like this is still happening on the trunk:
> 
> [14:38] svbu-mpi:~ % cat run
> #!/bin/csh -f
> 
> echo on `hostname`, foo is: $OMPI_MCA_foo
> exit 0
> [14:38] svbu-mpi:~ % setenv OMPI_MCA_foo bar
> [14:38] svbu-mpi:~ % ./run
> on svbu-mpi.cisco.com, foo is: bar
> [14:38] svbu-mpi:~ % mpirun -np 2 --bynode run
> on svbu-mpi044, foo is: bar
> on svbu-mpi043, foo is: bar
> [14:38] svbu-mpi:~ % unsetenv OMPI_MCA_foo
> [14:38] svbu-mpi:~ % mpirun -np 2 --bynode run
> OMPI_MCA_foo: Undefined variable.
> OMPI_MCA_foo: Undefined variable.
> ---
> While the primary job  terminated normally, 2 processes returned
> non-zero exit codes.. Further examination may be required.
> ---
> [14:38] svbu-mpi:~ % 
> 
> (I did not read this thread too carefully, so perhaps I missed an inference 
> in here somewhere...)
> 
> 
> 
> 
> 
> On Nov 25, 2011, at 5:21 PM, Ralph Castain wrote:
> 
>> 
>> On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote:
>> 
>>> Hello again,
>>> 
> Ralph Castain wrote:
>> Yes, that would indeed break things. The 1.5 series isn't correctly 
>> checking connections across multiple interfaces until it finds one that 
>> works - it just uses the first one it sees. :-(
> Yahhh!!
> This behaviour - catch a random interface and hang forever if something 
> is wrong with it - is somewhat less than perfect.
> 
> From my perspective - the users one - OpenMPI should try to use eitcher 
> *all* available networks (as 1.4 it does...), starting with the high 
> performance ones, or *only* those interfaces on which the hostnames from 
> the hostfile are bound to.
 It is indeed supposed to do the former - as I implied, this is a bug in 
 the 1.5 series.
>>> 
>>> Thanks for clarification. I was not sure about this is a bug or a feature 
>>> :-)
>>> 
>>> 
>>> 
> Also, there should be timeouts (if you cannot connect to a node within a 
> minute you probably will never ever be connected...)
 We have debated about this for some time - there is a timeout mca param 
 one can set, but we'll consider again making it default.
> If some connection runs into a timeout a warning would be great (and a 
> hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).
> 
> Should it not?
> Maybe you can file it as a "call for enhancement"...
 Probably the right approach at this time.
>>> 
>>> Ahhh.. sorry, did not understand what you mean.
>>> Did you filed it, or someone else, or should I do it in some way? Or should 
>>> not?
>> 
>> I'll take care of it, and copy you on the ticket so you can see what happens.
>> 
>> I'll also do the same for the connection bug - sorry for the problem :-(
>> 
>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> But then I ran into yet another one issue. In 
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
> the way to define MCA parameters over environment variables is described.
> 
> I tried it:
> $ export OMPI_MCA_oob_tcp_if_include=ib0
> $ export OMPI_MCA_btl_tcp_if_include=ib0
> 
> 
> I checked it:
> $ ompi_info --param all all | grep oob_tcp_if_include
>  MCA oob: parameter "oob_tcp_if_include" (current value: 
> , data source: environment or cmdline)
> $ ompi_info --param all all | grep btl_tcp_if_include
>  MCA btl: parameter "btl_tcp_if_include" (current value: 
> , data source: environment or cmdline)
> 
> 
> But then I get again the hang-up issue!
> 
> ==> seem, mpiexec does not understand these environment variables! and 
> only get the command line options. This should not be so?
 No, that isn't what is happening. The problem lies in the behavior of 
 rsh/ssh. This environment does not forward environmental variables. 
 Because of limits on cmd 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-28 Thread Jeff Squyres
On Nov 28, 2011, at 5:39 PM, Jeff Squyres wrote:

> (off list)

Hah!  So much for me discretely asking off-list before coming back with a 
definitive answer...  :-\

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-28 Thread Jeff Squyres
(off list)

Are you sure about OMPI_MCA_* params not being treated specially?  I know for a 
fact that they *used* to be.  I.e., we bundled up all env variables that began 
with OMPI_MCA_* and sent them with the job to back-end nodes.  It allowed 
sysadmins to set global MCA param values without editing the MCA param file on 
every node.

It looks like this is still happening on the trunk:

[14:38] svbu-mpi:~ % cat run
#!/bin/csh -f

echo on `hostname`, foo is: $OMPI_MCA_foo
exit 0
[14:38] svbu-mpi:~ % setenv OMPI_MCA_foo bar
[14:38] svbu-mpi:~ % ./run
on svbu-mpi.cisco.com, foo is: bar
[14:38] svbu-mpi:~ % mpirun -np 2 --bynode run
on svbu-mpi044, foo is: bar
on svbu-mpi043, foo is: bar
[14:38] svbu-mpi:~ % unsetenv OMPI_MCA_foo
[14:38] svbu-mpi:~ % mpirun -np 2 --bynode run
OMPI_MCA_foo: Undefined variable.
OMPI_MCA_foo: Undefined variable.
---
While the primary job  terminated normally, 2 processes returned
non-zero exit codes.. Further examination may be required.
---
[14:38] svbu-mpi:~ % 

(I did not read this thread too carefully, so perhaps I missed an inference in 
here somewhere...)





On Nov 25, 2011, at 5:21 PM, Ralph Castain wrote:

> 
> On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote:
> 
>> Hello again,
>> 
 Ralph Castain wrote:
> Yes, that would indeed break things. The 1.5 series isn't correctly 
> checking connections across multiple interfaces until it finds one that 
> works - it just uses the first one it sees. :-(
 Yahhh!!
 This behaviour - catch a random interface and hang forever if something is 
 wrong with it - is somewhat less than perfect.
 
 From my perspective - the users one - OpenMPI should try to use eitcher 
 *all* available networks (as 1.4 it does...), starting with the high 
 performance ones, or *only* those interfaces on which the hostnames from 
 the hostfile are bound to.
>>> It is indeed supposed to do the former - as I implied, this is a bug in the 
>>> 1.5 series.
>> 
>> Thanks for clarification. I was not sure about this is a bug or a feature :-)
>> 
>> 
>> 
 Also, there should be timeouts (if you cannot connect to a node within a 
 minute you probably will never ever be connected...)
>>> We have debated about this for some time - there is a timeout mca param one 
>>> can set, but we'll consider again making it default.
 If some connection runs into a timeout a warning would be great (and a 
 hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).
 
 Should it not?
 Maybe you can file it as a "call for enhancement"...
>>> Probably the right approach at this time.
>> 
>> Ahhh.. sorry, did not understand what you mean.
>> Did you filed it, or someone else, or should I do it in some way? Or should 
>> not?
> 
> I'll take care of it, and copy you on the ticket so you can see what happens.
> 
> I'll also do the same for the connection bug - sorry for the problem :-(
> 
> 
>> 
>> 
>> 
>> 
>> 
>> 
 But then I ran into yet another one issue. In 
 http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
 the way to define MCA parameters over environment variables is described.
 
 I tried it:
 $ export OMPI_MCA_oob_tcp_if_include=ib0
 $ export OMPI_MCA_btl_tcp_if_include=ib0
 
 
 I checked it:
 $ ompi_info --param all all | grep oob_tcp_if_include
   MCA oob: parameter "oob_tcp_if_include" (current value: 
 , data source: environment or cmdline)
 $ ompi_info --param all all | grep btl_tcp_if_include
   MCA btl: parameter "btl_tcp_if_include" (current value: 
 , data source: environment or cmdline)
 
 
 But then I get again the hang-up issue!
 
 ==> seem, mpiexec does not understand these environment variables! and 
 only get the command line options. This should not be so?
>>> No, that isn't what is happening. The problem lies in the behavior of 
>>> rsh/ssh. This environment does not forward environmental variables. Because 
>>> of limits on cmd line length, we don't automatically forward MCA params 
>>> from the environment, but only from the cmd line. It is an annoying 
>>> limitation, but one outside our control.
>> 
>> We know about "ssh does not forward environmental variables." But in this 
>> case, are these parameters not the parameters of mpiexec itself, too?
>> 
>> The crucial thing is, that setting of the parameters works over the command 
>> line but *does not work* over the envvar way (as in 
>> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params described). 
>> This looks like a bug for me!
>> 
>> 
>> 
>> 
>> 
>>> Put those envars in the default mca param file and the problem will be 
>>> resolved.
>> 
>> You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of 
>> 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-25 Thread Ralph Castain

On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote:

> Hello again,
> 
>>> Ralph Castain wrote:
 Yes, that would indeed break things. The 1.5 series isn't correctly 
 checking connections across multiple interfaces until it finds one that 
 works - it just uses the first one it sees. :-(
>>> Yahhh!!
>>> This behaviour - catch a random interface and hang forever if something is 
>>> wrong with it - is somewhat less than perfect.
>>> 
>>> From my perspective - the users one - OpenMPI should try to use eitcher 
>>> *all* available networks (as 1.4 it does...), starting with the high 
>>> performance ones, or *only* those interfaces on which the hostnames from 
>>> the hostfile are bound to.
>> It is indeed supposed to do the former - as I implied, this is a bug in the 
>> 1.5 series.
> 
> Thanks for clarification. I was not sure about this is a bug or a feature :-)
> 
> 
> 
>>> Also, there should be timeouts (if you cannot connect to a node within a 
>>> minute you probably will never ever be connected...)
>> We have debated about this for some time - there is a timeout mca param one 
>> can set, but we'll consider again making it default.
>>> If some connection runs into a timeout a warning would be great (and a hint 
>>> to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).
>>> 
>>> Should it not?
>>> Maybe you can file it as a "call for enhancement"...
>> Probably the right approach at this time.
> 
> Ahhh.. sorry, did not understand what you mean.
> Did you filed it, or someone else, or should I do it in some way? Or should 
> not?

I'll take care of it, and copy you on the ticket so you can see what happens.

I'll also do the same for the connection bug - sorry for the problem :-(


> 
> 
> 
> 
> 
> 
>>> But then I ran into yet another one issue. In 
>>> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
>>> the way to define MCA parameters over environment variables is described.
>>> 
>>> I tried it:
>>> $ export OMPI_MCA_oob_tcp_if_include=ib0
>>> $ export OMPI_MCA_btl_tcp_if_include=ib0
>>> 
>>> 
>>> I checked it:
>>> $ ompi_info --param all all | grep oob_tcp_if_include
>>>MCA oob: parameter "oob_tcp_if_include" (current value: 
>>> , data source: environment or cmdline)
>>> $ ompi_info --param all all | grep btl_tcp_if_include
>>>MCA btl: parameter "btl_tcp_if_include" (current value: 
>>> , data source: environment or cmdline)
>>> 
>>> 
>>> But then I get again the hang-up issue!
>>> 
>>> ==> seem, mpiexec does not understand these environment variables! and only 
>>> get the command line options. This should not be so?
>> No, that isn't what is happening. The problem lies in the behavior of 
>> rsh/ssh. This environment does not forward environmental variables. Because 
>> of limits on cmd line length, we don't automatically forward MCA params from 
>> the environment, but only from the cmd line. It is an annoying limitation, 
>> but one outside our control.
> 
> We know about "ssh does not forward environmental variables." But in this 
> case, are these parameters not the parameters of mpiexec itself, too?
> 
> The crucial thing is, that setting of the parameters works over the command 
> line but *does not work* over the envvar way (as in 
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params described). 
> This looks like a bug for me!
> 
> 
> 
> 
> 
>> Put those envars in the default mca param file and the problem will be 
>> resolved.
> 
> You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of 
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
> 
> Well, this is possible, but not flexible enough for us (because there are 
> some machines which only can run if the parameters are *not* set - on those 
> the ssh goes just over these eth0 devices).
> 
> By now we use the command line parameters and hope the envvar way will work 
> sometimes.
> 
> 
>>> (I also tried to advise to provide the envvars by -x 
>>> OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing 
>>> changed.
>> I'm surprised by that - they should be picked up and forwarded. Could be a 
>> bug
> 
> Well, I also mean this is a bug, but as said not on providing the values of 
> envvars but on detecting of these parameters at all. Or maybe on both.
> 
> 
> 
> 
>>> Well, they are OMPI_ variables and should be provided in any case).
>> No, they aren't - they are not treated differently than any other envar.
> 
> [after performing some RTFM...]
> at least the man page of mpiexec says, the OMPI_ environment variables are 
> always provided and thus treated *differently* than other envvars:
> 
> $ man mpiexec
> 
> Exported Environment Variables
>   All environment variables that are named in the form OMPI_* will  
> automatically  be  exported to new processes on the local and remote nodes.
> 
> So, tells the man page lies, or this is an removed feature, or something else?
> 
> 
> Best wishes,
> 
> 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-25 Thread Paul Kapinos

Hello again,


Ralph Castain wrote:

Yes, that would indeed break things. The 1.5 series isn't correctly checking 
connections across multiple interfaces until it finds one that works - it just 
uses the first one it sees. :-(

Yahhh!!
This behaviour - catch a random interface and hang forever if something is 
wrong with it - is somewhat less than perfect.

From my perspective - the users one - OpenMPI should try to use eitcher *all* 
available networks (as 1.4 it does...), starting with the high performance 
ones, or *only* those interfaces on which the hostnames from the hostfile are 
bound to.


It is indeed supposed to do the former - as I implied, this is a bug in the 1.5 
series.


Thanks for clarification. I was not sure about this is a bug or a 
feature :-)





Also, there should be timeouts (if you cannot connect to a node within a minute 
you probably will never ever be connected...)


We have debated about this for some time - there is a timeout mca param one can 
set, but we'll consider again making it default.


If some connection runs into a timeout a warning would be great (and a hint to 
take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).

Should it not?
Maybe you can file it as a "call for enhancement"...


Probably the right approach at this time.


Ahhh.. sorry, did not understand what you mean.
Did you filed it, or someone else, or should I do it in some way? Or 
should not?








But then I ran into yet another one issue. In 
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
the way to define MCA parameters over environment variables is described.

I tried it:
$ export OMPI_MCA_oob_tcp_if_include=ib0
$ export OMPI_MCA_btl_tcp_if_include=ib0


I checked it:
$ ompi_info --param all all | grep oob_tcp_if_include
MCA oob: parameter "oob_tcp_if_include" (current value: , 
data source: environment or cmdline)
$ ompi_info --param all all | grep btl_tcp_if_include
MCA btl: parameter "btl_tcp_if_include" (current value: , 
data source: environment or cmdline)


But then I get again the hang-up issue!

==> seem, mpiexec does not understand these environment variables! and only get 
the command line options. This should not be so?


No, that isn't what is happening. The problem lies in the behavior of rsh/ssh. 
This environment does not forward environmental variables. Because of limits on 
cmd line length, we don't automatically forward MCA params from the 
environment, but only from the cmd line. It is an annoying limitation, but one 
outside our control.


We know about "ssh does not forward environmental variables." But in 
this case, are these parameters not the parameters of mpiexec itself, too?


The crucial thing is, that setting of the parameters works over the 
command line but *does not work* over the envvar way (as in 
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params 
described). This looks like a bug for me!








Put those envars in the default mca param file and the problem will be resolved.


You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of 
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params


Well, this is possible, but not flexible enough for us (because there 
are some machines which only can run if the parameters are *not* set - 
on those the ssh goes just over these eth0 devices).


By now we use the command line parameters and hope the envvar way will 
work sometimes.




(I also tried to advise to provide the envvars by -x 
OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing changed.


I'm surprised by that - they should be picked up and forwarded. Could be a bug


Well, I also mean this is a bug, but as said not on providing the values 
of envvars but on detecting of these parameters at all. Or maybe on both.






Well, they are OMPI_ variables and should be provided in any case).


No, they aren't - they are not treated differently than any other envar.


[after performing some RTFM...]
at least the man page of mpiexec says, the OMPI_ environment variables 
are always provided and thus treated *differently* than other envvars:


$ man mpiexec

 Exported Environment Variables
   All environment variables that are named in the form OMPI_* will 
 automatically  be  exported to new processes on the local and remote 
nodes.


So, tells the man page lies, or this is an removed feature, or something 
else?



Best wishes,

Paul Kapinos






Specifying both include and exclude should generate an error as those are 
mutually exclusive options - I think this was also missed in early 1.5 releases 
and was recently patched.
HTH
Ralph
On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote:

On 11/23/2011 2:02 PM, Paul Kapinos wrote:

Hello Ralph, hello all,

Two news, as usual a good and a bad one.

The good: we believe to find out *why* it hangs

The bad: it seem for me, this is a bug or at least undocumented feature of Open 
MPI /1.5.x.

In 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-24 Thread Ralph Castain

On Nov 24, 2011, at 11:49 AM, Paul Kapinos wrote:

> Hello Ralph, Terry, all!
> 
> again, two news: the good one and the second one.
> 
> Ralph Castain wrote:
>> Yes, that would indeed break things. The 1.5 series isn't correctly checking 
>> connections across multiple interfaces until it finds one that works - it 
>> just uses the first one it sees. :-(
> 
> Yahhh!!
> This behaviour - catch a random interface and hang forever if something is 
> wrong with it - is somewhat less than perfect.
> 
> From my perspective - the users one - OpenMPI should try to use eitcher *all* 
> available networks (as 1.4 it does...), starting with the high performance 
> ones, or *only* those interfaces on which the hostnames from the hostfile are 
> bound to.

It is indeed supposed to do the former - as I implied, this is a bug in the 1.5 
series.

> 
> Also, there should be timeouts (if you cannot connect to a node within a 
> minute you probably will never ever be connected...)

We have debated about this for some time - there is a timeout mca param one can 
set, but we'll consider again making it default.

> 
> If some connection runs into a timeout a warning would be great (and a hint 
> to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).
> 
> Should it not?
> Maybe you can file it as a "call for enhancement"...

Probably the right approach at this time.

> 
> 
> 
>> The solution is to specify -mca oob_tcp_if_include ib0. This will direct the 
>> run-time wireup across the IP over IB interface.
>> You will also need the -mca btl_tcp_if_include ib0 as well so the MPI comm 
>> goes exclusively over that network. 
> 
> YES! This works. Adding
> -mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0
> to the command line of mpiexec helps me to run the 1.5.x programs, so I 
> believe this is the workaround.
> 
> Many thanks for this hint, Ralph! My fail to not to find it in the FAQ (I was 
> so close :o) http://www.open-mpi.org/faq/?category=tcp#tcp-selection
> 
> But then I ran into yet another one issue. In 
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
> the way to define MCA parameters over environment variables is described.
> 
> I tried it:
> $ export OMPI_MCA_oob_tcp_if_include=ib0
> $ export OMPI_MCA_btl_tcp_if_include=ib0
> 
> 
> I checked it:
> $ ompi_info --param all all | grep oob_tcp_if_include
> MCA oob: parameter "oob_tcp_if_include" (current value: 
> , data source: environment or cmdline)
> $ ompi_info --param all all | grep btl_tcp_if_include
> MCA btl: parameter "btl_tcp_if_include" (current value: 
> , data source: environment or cmdline)
> 
> 
> But then I get again the hang-up issue!
> 
> ==> seem, mpiexec does not understand these environment variables! and only 
> get the command line options. This should not be so?

No, that isn't what is happening. The problem lies in the behavior of rsh/ssh. 
This environment does not forward environmental variables. Because of limits on 
cmd line length, we don't automatically forward MCA params from the 
environment, but only from the cmd line. It is an annoying limitation, but one 
outside our control.

Put those envars in the default mca param file and the problem will be resolved.

> 
> (I also tried to advise to provide the envvars by -x 
> OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing changed.

I'm surprised by that - they should be picked up and forwarded. Could be a bug

> Well, they are OMPI_ variables and should be provided in any case).

No, they aren't - they are not treated differently than any other envar.

> 
> 
> Best wishes and many thanks for all,
> 
> Paul Kapinos
> 
> 
> 
> 
>> Specifying both include and exclude should generate an error as those are 
>> mutually exclusive options - I think this was also missed in early 1.5 
>> releases and was recently patched.
>> HTH
>> Ralph
>> On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote:
>>> On 11/23/2011 2:02 PM, Paul Kapinos wrote:
 Hello Ralph, hello all,
 
 Two news, as usual a good and a bad one.
 
 The good: we believe to find out *why* it hangs
 
 The bad: it seem for me, this is a bug or at least undocumented feature of 
 Open MPI /1.5.x.
 
 In detail:
 As said, we see mystery hang-ups if starting on some nodes using some 
 permutation of hostnames. Usually removing "some bad" nodes helps, 
 sometimes a permutation of node names in the hostfile is enough(!). The 
 behaviour is reproducible.
 
 The machines have at least 2 networks:
 
 *eth0* is used for installation, monitoring, ... - this ethernet is very 
 slim
 
 *ib0* - is the "IP over IB" interface and is used for everything: the file 
 systems, ssh and so on. The hostnames are bound to the ib0 network; our 
 idea was not to use eth0 for MPI at all.
 
 all machines are available from any over ib0 (are in one network).
 
 But on eth0 there 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-24 Thread Paul Kapinos

Hello Ralph, Terry, all!

again, two news: the good one and the second one.

Ralph Castain wrote:
Yes, that would indeed break things. The 1.5 series isn't correctly 
checking connections across multiple interfaces until it finds one that 
works - it just uses the first one it sees. :-(


Yahhh!!
This behaviour - catch a random interface and hang forever if something 
is wrong with it - is somewhat less than perfect.


From my perspective - the users one - OpenMPI should try to use eitcher 
*all* available networks (as 1.4 it does...), starting with the high 
performance ones, or *only* those interfaces on which the hostnames from 
the hostfile are bound to.


Also, there should be timeouts (if you cannot connect to a node within a 
minute you probably will never ever be connected...)


If some connection runs into a timeout a warning would be great (and a 
hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).


Should it not?
Maybe you can file it as a "call for enhancement"...



The solution is to specify -mca oob_tcp_if_include ib0. This will direct 
the run-time wireup across the IP over IB interface.


You will also need the -mca btl_tcp_if_include ib0 as well so the MPI 
comm goes exclusively over that network. 


YES! This works. Adding
-mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0
to the command line of mpiexec helps me to run the 1.5.x programs, so I 
believe this is the workaround.


Many thanks for this hint, Ralph! My fail to not to find it in the FAQ 
(I was so close :o) http://www.open-mpi.org/faq/?category=tcp#tcp-selection


But then I ran into yet another one issue. In 
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params

the way to define MCA parameters over environment variables is described.

I tried it:
$ export OMPI_MCA_oob_tcp_if_include=ib0
$ export OMPI_MCA_btl_tcp_if_include=ib0


I checked it:
$ ompi_info --param all all | grep oob_tcp_if_include
 MCA oob: parameter "oob_tcp_if_include" (current 
value: , data source: environment or cmdline)

$ ompi_info --param all all | grep btl_tcp_if_include
 MCA btl: parameter "btl_tcp_if_include" (current 
value: , data source: environment or cmdline)



But then I get again the hang-up issue!

==> seem, mpiexec does not understand these environment variables! and 
only get the command line options. This should not be so?


(I also tried to advise to provide the envvars by -x 
OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing 
changed. Well, they are OMPI_ variables and should be provided in any case).



Best wishes and many thanks for all,

Paul Kapinos




Specifying both include and 
exclude should generate an error as those are mutually exclusive options 
- I think this was also missed in early 1.5 releases and was recently 
patched.


HTH
Ralph


On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote:


On 11/23/2011 2:02 PM, Paul Kapinos wrote:

Hello Ralph, hello all,

Two news, as usual a good and a bad one.

The good: we believe to find out *why* it hangs

The bad: it seem for me, this is a bug or at least undocumented 
feature of Open MPI /1.5.x.


In detail:
As said, we see mystery hang-ups if starting on some nodes using some 
permutation of hostnames. Usually removing "some bad" nodes helps, 
sometimes a permutation of node names in the hostfile is enough(!). 
The behaviour is reproducible.


The machines have at least 2 networks:

*eth0* is used for installation, monitoring, ... - this ethernet is 
very slim


*ib0* - is the "IP over IB" interface and is used for everything: the 
file systems, ssh and so on. The hostnames are bound to the ib0 
network; our idea was not to use eth0 for MPI at all.


all machines are available from any over ib0 (are in one network).

But on eth0 there are at least two different networks; especially the 
computer linuxbsc025 is in different network than the others and is 
not reachable from other nodes over eth0! (but reachable over ib0. 
The name used in the hostfile is resolved to the IP of ib0 ).


So I believe that Open MPI /1.5.x tries to communicate over eth0 and 
cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 
1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug?


I also tried to disable the eth0 completely:

$ mpiexec -mca btl_tcp_if_exclude eth0,lo  -mca btl_tcp_if_include 
ib0 ...


I believe if you give "-mca btl_tcp_if_include ib0" you do not need to 
specify the exclude parameter.
...but this does not help. All right, the above command should 
disable the usage of eth0 for MPI communication itself, but it hangs 
just before the MPI is started, isn't it? (because one process lacks, 
the MPI_INIT cannot be passed)


By "just before the MPI is started" do you mean while orte is 
launching the processes.
I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but 
I think that may depend on which oob you are using.
Now a question: is there a way to forbid the mpiexec to 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-23 Thread Ralph Castain
Yes, that would indeed break things. The 1.5 series isn't correctly checking 
connections across multiple interfaces until it finds one that works - it just 
uses the first one it sees. :-(

The solution is to specify -mca oob_tcp_if_include ib0. This will direct the 
run-time wireup across the IP over IB interface.

You will also need the -mca btl_tcp_if_include ib0 as well so the MPI comm goes 
exclusively over that network. Specifying both include and exclude should 
generate an error as those are mutually exclusive options - I think this was 
also missed in early 1.5 releases and was recently patched.

HTH
Ralph


On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote:

> On 11/23/2011 2:02 PM, Paul Kapinos wrote:
>> 
>> Hello Ralph, hello all, 
>> 
>> Two news, as usual a good and a bad one. 
>> 
>> The good: we believe to find out *why* it hangs 
>> 
>> The bad: it seem for me, this is a bug or at least undocumented feature of 
>> Open MPI /1.5.x. 
>> 
>> In detail: 
>> As said, we see mystery hang-ups if starting on some nodes using some 
>> permutation of hostnames. Usually removing "some bad" nodes helps, sometimes 
>> a permutation of node names in the hostfile is enough(!). The behaviour is 
>> reproducible. 
>> 
>> The machines have at least 2 networks: 
>> 
>> *eth0* is used for installation, monitoring, ... - this ethernet is very 
>> slim 
>> 
>> *ib0* - is the "IP over IB" interface and is used for everything: the file 
>> systems, ssh and so on. The hostnames are bound to the ib0 network; our idea 
>> was not to use eth0 for MPI at all. 
>> 
>> all machines are available from any over ib0 (are in one network). 
>> 
>> But on eth0 there are at least two different networks; especially the 
>> computer linuxbsc025 is in different network than the others and is not 
>> reachable from other nodes over eth0! (but reachable over ib0. The name used 
>> in the hostfile is resolved to the IP of ib0 ). 
>> 
>> So I believe that Open MPI /1.5.x tries to communicate over eth0 and cannot 
>> do it, and hangs. The /1.4.3 does not hang, so this issue is 1.5.x-specific 
>> (seen in 1.5.3 and 1.5.4). A bug? 
>> 
>> I also tried to disable the eth0 completely: 
>> 
>> $ mpiexec -mca btl_tcp_if_exclude eth0,lo  -mca btl_tcp_if_include ib0 ... 
>> 
> I believe if you give "-mca btl_tcp_if_include ib0" you do not need to 
> specify the exclude parameter.
>> ...but this does not help. All right, the above command should disable the 
>> usage of eth0 for MPI communication itself, but it hangs just before the MPI 
>> is started, isn't it? (because one process lacks, the MPI_INIT cannot be 
>> passed) 
>> 
> By "just before the MPI is started" do you mean while orte is launching the 
> processes.
> I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but I 
> think that may depend on which oob you are using.
>> Now a question: is there a way to forbid the mpiexec to use some interfaces 
>> at all? 
>> 
>> Best wishes, 
>> 
>> Paul Kapinos 
>> 
>> P.S. Of course we know about the good idea to bring all nodes into the same 
>> net on eth0, but at this point it is impossible due of technical 
>> reason[s]... 
>> 
>> P.S.2 I'm not sure that the issue is really rooted in the above mentioned 
>> misconfiguration of eth0, but I have no better idea at this point... 
>> 
>> 
 The map seem to be correctly build, also the output if the daemons seem to 
 be the same (see helloworld.txt) 
>>> 
>>> Unfortunately, it appears that OMPI was not built with --enable-debug as 
>>> there is no debug info in the output. Without a debug installation of OMPI, 
>>> the ability to determine the problem is pretty limited. 
>> 
>> well, this will be the next option we will activate. We also have another 
>> issue here, on (not) using uDAPL.. 
>> 
>> 
>>> 
>>> 
> You should also try putting that long list of nodes in a hostfile - see 
> if that makes a difference. 
> It will process the nodes thru a different code path, so if there is some 
> problem in --host, 
> this will tell us. 
 No, with the host file instead of host list on command line the behaviour 
 is the same. 
 
 But, I just found out that the 1.4.3 does *not* hang on this 
 constellation. The next thing I will try will be the installation of 1.5.4 
 :o) 
 
 Best, 
 
 Paul 
 
 P.S. started: 
 
 $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini 
 -mca odls_base_verbose 5 --leave-session-attached --display-map  
 helloworld 2>&1 | tee helloworld.txt 
 
 
 
> On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote: 
>> Hello Open MPI volks, 
>> 
>> We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, 
>> and we have some strange hangups if starting OpenMPI processes. 
>> 
>> The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of 
>>  offline nodes). Each node is accessible from each other 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-23 Thread TERRY DONTJE

On 11/23/2011 2:02 PM, Paul Kapinos wrote:

Hello Ralph, hello all,

Two news, as usual a good and a bad one.

The good: we believe to find out *why* it hangs

The bad: it seem for me, this is a bug or at least undocumented 
feature of Open MPI /1.5.x.


In detail:
As said, we see mystery hang-ups if starting on some nodes using some 
permutation of hostnames. Usually removing "some bad" nodes helps, 
sometimes a permutation of node names in the hostfile is enough(!). 
The behaviour is reproducible.


The machines have at least 2 networks:

*eth0* is used for installation, monitoring, ... - this ethernet is 
very slim


*ib0* - is the "IP over IB" interface and is used for everything: the 
file systems, ssh and so on. The hostnames are bound to the ib0 
network; our idea was not to use eth0 for MPI at all.


all machines are available from any over ib0 (are in one network).

But on eth0 there are at least two different networks; especially the 
computer linuxbsc025 is in different network than the others and is 
not reachable from other nodes over eth0! (but reachable over ib0. The 
name used in the hostfile is resolved to the IP of ib0 ).


So I believe that Open MPI /1.5.x tries to communicate over eth0 and 
cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 
1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug?


I also tried to disable the eth0 completely:

$ mpiexec -mca btl_tcp_if_exclude eth0,lo  -mca btl_tcp_if_include ib0 
...


I believe if you give "-mca btl_tcp_if_include ib0" you do not need to 
specify the exclude parameter.
...but this does not help. All right, the above command should disable 
the usage of eth0 for MPI communication itself, but it hangs just 
before the MPI is started, isn't it? (because one process lacks, the 
MPI_INIT cannot be passed)


By "just before the MPI is started" do you mean while orte is launching 
the processes.
I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but I 
think that may depend on which oob you are using.
Now a question: is there a way to forbid the mpiexec to use some 
interfaces at all?


Best wishes,

Paul Kapinos

P.S. Of course we know about the good idea to bring all nodes into the 
same net on eth0, but at this point it is impossible due of technical 
reason[s]...


P.S.2 I'm not sure that the issue is really rooted in the above 
mentioned misconfiguration of eth0, but I have no better idea at this 
point...



The map seem to be correctly build, also the output if the daemons 
seem to be the same (see helloworld.txt)


Unfortunately, it appears that OMPI was not built with --enable-debug 
as there is no debug info in the output. Without a debug installation 
of OMPI, the ability to determine the problem is pretty limited.


well, this will be the next option we will activate. We also have 
another issue here, on (not) using uDAPL..






You should also try putting that long list of nodes in a hostfile - 
see if that makes a difference.
It will process the nodes thru a different code path, so if there 
is some problem in --host,

this will tell us.
No, with the host file instead of host list on command line the 
behaviour is the same.


But, I just found out that the 1.4.3 does *not* hang on this 
constellation. The next thing I will try will be the installation of 
1.5.4 :o)


Best,

Paul

P.S. started:

$ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile 
hostfile-mini -mca odls_base_verbose 5 --leave-session-attached 
--display-map  helloworld 2>&1 | tee helloworld.txt





On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:

Hello Open MPI volks,

We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand 
cluster, and we have some strange hangups if starting OpenMPI 
processes.


The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna 
due of  offline nodes). Each node is accessible from each other 
over SSH (without password), also MPI programs between any two 
nodes are checked to run.



So long, I tried to start some bigger number of processes, one 
process per node:

$ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe

Now the problem: there are some constellations of names in the 
host list on which mpiexec reproducible hangs forever; and more 
surprising: other *permutation* of the *same* node names may run 
without any errors!


Example: the command in laueft.txt runs OK, the command in 
haengt.txt hangs. Note: the only difference is that the node 
linuxbsc025 is put on the end of the host list. Amazed, too?


Looking on the particular nodes during the above mpiexec hangs, we 
found the orted daemons started on *each* node and the binary on 
all but one node (orted.txt, MPI_FastTest.txt).
Again amazing that the node with no user process started (leading 
to hangup in MPI_Init of all processes and thus to hangup, I 
believe) was always the same, linuxbsc005, which is NOT the 
permuted item linuxbsc025...


This behaviour is reproducible. The hang-on only occure 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-23 Thread Paul Kapinos

Hello Ralph, hello all,

Two news, as usual a good and a bad one.

The good: we believe to find out *why* it hangs

The bad: it seem for me, this is a bug or at least undocumented feature 
of Open MPI /1.5.x.


In detail:
As said, we see mystery hang-ups if starting on some nodes using some 
permutation of hostnames. Usually removing "some bad" nodes helps, 
sometimes a permutation of node names in the hostfile is enough(!). The 
behaviour is reproducible.


The machines have at least 2 networks:

*eth0* is used for installation, monitoring, ... - this ethernet is very 
slim


*ib0* - is the "IP over IB" interface and is used for everything: the 
file systems, ssh and so on. The hostnames are bound to the ib0 network; 
our idea was not to use eth0 for MPI at all.


all machines are available from any over ib0 (are in one network).

But on eth0 there are at least two different networks; especially the 
computer linuxbsc025 is in different network than the others and is not 
reachable from other nodes over eth0! (but reachable over ib0. The name 
used in the hostfile is resolved to the IP of ib0 ).


So I believe that Open MPI /1.5.x tries to communicate over eth0 and 
cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 
1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug?


I also tried to disable the eth0 completely:

$ mpiexec -mca btl_tcp_if_exclude eth0,lo  -mca btl_tcp_if_include ib0 ...

...but this does not help. All right, the above command should disable 
the usage of eth0 for MPI communication itself, but it hangs just before 
the MPI is started, isn't it? (because one process lacks, the MPI_INIT 
cannot be passed)


Now a question: is there a way to forbid the mpiexec to use some 
interfaces at all?


Best wishes,

Paul Kapinos

P.S. Of course we know about the good idea to bring all nodes into the 
same net on eth0, but at this point it is impossible due of technical 
reason[s]...


P.S.2 I'm not sure that the issue is really rooted in the above 
mentioned misconfiguration of eth0, but I have no better idea at this 
point...




The map seem to be correctly build, also the output if the daemons seem to be 
the same (see helloworld.txt)


Unfortunately, it appears that OMPI was not built with --enable-debug as there 
is no debug info in the output. Without a debug installation of OMPI, the 
ability to determine the problem is pretty limited.


well, this will be the next option we will activate. We also have 
another issue here, on (not) using uDAPL..







You should also try putting that long list of nodes in a hostfile - see if that 
makes a difference.
It will process the nodes thru a different code path, so if there is some 
problem in --host,
this will tell us.

No, with the host file instead of host list on command line the behaviour is 
the same.

But, I just found out that the 1.4.3 does *not* hang on this constellation. The 
next thing I will try will be the installation of 1.5.4 :o)

Best,

Paul

P.S. started:

$ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini -mca 
odls_base_verbose 5 --leave-session-attached --display-map  helloworld 2>&1 | 
tee helloworld.txt




On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:

Hello Open MPI volks,

We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we 
have some strange hangups if starting OpenMPI processes.

The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of  
offline nodes). Each node is accessible from each other over SSH (without 
password), also MPI programs between any two nodes are checked to run.


So long, I tried to start some bigger number of processes, one process per node:
$ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe

Now the problem: there are some constellations of names in the host list on 
which mpiexec reproducible hangs forever; and more surprising: other 
*permutation* of the *same* node names may run without any errors!

Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. 
Note: the only difference is that the node linuxbsc025 is put on the end of the 
host list. Amazed, too?

Looking on the particular nodes during the above mpiexec hangs, we found the 
orted daemons started on *each* node and the binary on all but one node 
(orted.txt, MPI_FastTest.txt).
Again amazing that the node with no user process started (leading to hangup in 
MPI_Init of all processes and thus to hangup, I believe) was always the same, 
linuxbsc005, which is NOT the permuted item linuxbsc025...

This behaviour is reproducible. The hang-on only occure if the started application is a 
MPI application ("hostname" does not hang).


Any Idea what is gonna on?


Best,

Paul Kapinos


P.S: no alias names used, all names are real ones







--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-22 Thread Ralph Castain

On Nov 22, 2011, at 10:10 AM, Paul Kapinos wrote:

> Hello Ralph, hello all.
> 
>> No real ideas, I'm afraid. We regularly launch much larger jobs than that 
>> using ssh without problem,
> I was also able to run a 288-node-job yesterday - the size alone is not the 
> problem...
> 
> 
> 
>> so it is likely something about the local setup of that node that is causing 
>> the problem. Offhand, it sounds like either the mapper isn't getting things 
>> right, or for some reason the daemon on 005 isn't properly getting or 
>> processing the launch command.
>> What you could try is adding --display-map to see if the map is being 
>> correctly generated.
> > If that works, then (using a debug build) try adding 
> > --leave-session-attached and see if
> > any daemons are outputting an error.
>> You could add -mca odls_base_verbose 5 --leave-session-attached to your cmd 
>> line. 
> > You'll see debug output from each daemon as it receives and processes
>> the launch command.  See if the daemon on 005 is behaving differently than 
>> the others.
> 
> I've tried the options.
> The map seem to be correctly build, also the output if the daemons seem to be 
> the same (see helloworld.txt)

Unfortunately, it appears that OMPI was not built with --enable-debug as there 
is no debug info in the output. Without a debug installation of OMPI, the 
ability to determine the problem is pretty limited.


> 
>> You should also try putting that long list of nodes in a hostfile - see if 
>> that makes a difference.
> > It will process the nodes thru a different code path, so if there is some 
> > problem in --host,
>> this will tell us.
> 
> No, with the host file instead of host list on command line the behaviour is 
> the same.
> 
> But, I just found out that the 1.4.3 does *not* hang on this constellation. 
> The next thing I will try will be the installation of 1.5.4 :o)
> 
> Best,
> 
> Paul
> 
> P.S. started:
> 
> $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini 
> -mca odls_base_verbose 5 --leave-session-attached --display-map  helloworld 
> 2>&1 | tee helloworld.txt
> 
> 
> 
>> On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:
>>> Hello Open MPI volks,
>>> 
>>> We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and 
>>> we have some strange hangups if starting OpenMPI processes.
>>> 
>>> The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of  
>>> offline nodes). Each node is accessible from each other over SSH (without 
>>> password), also MPI programs between any two nodes are checked to run.
>>> 
>>> 
>>> So long, I tried to start some bigger number of processes, one process per 
>>> node:
>>> $ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe
>>> 
>>> Now the problem: there are some constellations of names in the host list on 
>>> which mpiexec reproducible hangs forever; and more surprising: other 
>>> *permutation* of the *same* node names may run without any errors!
>>> 
>>> Example: the command in laueft.txt runs OK, the command in haengt.txt 
>>> hangs. Note: the only difference is that the node linuxbsc025 is put on the 
>>> end of the host list. Amazed, too?
>>> 
>>> Looking on the particular nodes during the above mpiexec hangs, we found 
>>> the orted daemons started on *each* node and the binary on all but one node 
>>> (orted.txt, MPI_FastTest.txt).
>>> Again amazing that the node with no user process started (leading to hangup 
>>> in MPI_Init of all processes and thus to hangup, I believe) was always the 
>>> same, linuxbsc005, which is NOT the permuted item linuxbsc025...
>>> 
>>> This behaviour is reproducible. The hang-on only occure if the started 
>>> application is a MPI application ("hostname" does not hang).
>>> 
>>> 
>>> Any Idea what is gonna on?
>>> 
>>> 
>>> Best,
>>> 
>>> Paul Kapinos
>>> 
>>> 
>>> P.S: no alias names used, all names are real ones
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
>>> RWTH Aachen University, Center for Computing and Communication
>>> Seffenter Weg 23,  D 52074  Aachen (Germany)
>>> Tel: +49 241/80-24915
>>> linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc002: STDOUT:  2142 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe
>>> linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe
>>> 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-22 Thread Paul Kapinos

Hello Ralph, hello all.


No real ideas, I'm afraid. We regularly launch much larger jobs than that using 
ssh without problem,
I was also able to run a 288-node-job yesterday - the size alone is not 
the problem...




so it is likely something about the local setup of that node that is causing the problem. 
Offhand, it sounds like either the mapper isn't getting things right, or for some reason 
the daemon on 005 isn't properly getting or processing the launch command.


What you could try is adding --display-map to see if the map is being correctly 
generated.
> If that works, then (using a debug build) try adding 
--leave-session-attached and see if

> any daemons are outputting an error.


You could add -mca odls_base_verbose 5 --leave-session-attached to your cmd line. 

> You'll see debug output from each daemon as it receives and processes

the launch command.  See if the daemon on 005 is behaving differently than the 
others.


I've tried the options.
The map seem to be correctly build, also the output if the daemons seem 
to be the same (see helloworld.txt)



You should also try putting that long list of nodes in a hostfile - see if that 
makes a difference.
> It will process the nodes thru a different code path, so if there is 
some problem in --host,

this will tell us.


No, with the host file instead of host list on command line the 
behaviour is the same.


But, I just found out that the 1.4.3 does *not* hang on this 
constellation. The next thing I will try will be the installation of 
1.5.4 :o)


Best,

Paul

P.S. started:

$ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile 
hostfile-mini -mca odls_base_verbose 5 --leave-session-attached 
--display-map  helloworld 2>&1 | tee helloworld.txt







On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:


Hello Open MPI volks,

We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we 
have some strange hangups if starting OpenMPI processes.

The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of  
offline nodes). Each node is accessible from each other over SSH (without 
password), also MPI programs between any two nodes are checked to run.


So long, I tried to start some bigger number of processes, one process per node:
$ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe

Now the problem: there are some constellations of names in the host list on 
which mpiexec reproducible hangs forever; and more surprising: other 
*permutation* of the *same* node names may run without any errors!

Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. 
Note: the only difference is that the node linuxbsc025 is put on the end of the 
host list. Amazed, too?

Looking on the particular nodes during the above mpiexec hangs, we found the 
orted daemons started on *each* node and the binary on all but one node 
(orted.txt, MPI_FastTest.txt).
Again amazing that the node with no user process started (leading to hangup in 
MPI_Init of all processes and thus to hangup, I believe) was always the same, 
linuxbsc005, which is NOT the permuted item linuxbsc025...

This behaviour is reproducible. The hang-on only occure if the started application is a 
MPI application ("hostname" does not hang).


Any Idea what is gonna on?


Best,

Paul Kapinos


P.S: no alias names used, all names are real ones







--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe
linuxbsc002: STDOUT:  2142 ?SLl0:00 MPI_FastTest.exe
linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe
linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe
linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe
linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe
linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe
linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe
linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe
linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe
linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe
linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe
linuxbsc017: STDOUT: 54693 ?SLl0:00 MPI_FastTest.exe
linuxbsc018: STDOUT: 54187 ?SLl0:00 MPI_FastTest.exe
linuxbsc020: STDOUT: 55811 ?SLl0:00 MPI_FastTest.exe
linuxbsc021: STDOUT: 54982 ?SLl0:00 MPI_FastTest.exe
linuxbsc022: STDOUT: 50032 ?SLl0:00 MPI_FastTest.exe
linuxbsc023: STDOUT: 54044 ?SLl0:00 MPI_FastTest.exe
linuxbsc024: STDOUT: 51247 ?SLl0:00 MPI_FastTest.exe
linuxbsc025: STDOUT: 18575 ?SLl0:00 MPI_FastTest.exe
linuxbsc027: STDOUT: 48969 ?SLl0:00 MPI_FastTest.exe
linuxbsc028: STDOUT: 52397 ?SLl0:00 

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-21 Thread Ralph Castain
No real ideas, I'm afraid. We regularly launch much larger jobs than that using 
ssh without problem, so it is likely something about the local setup of that 
node that is causing the problem. Offhand, it sounds like either the mapper 
isn't getting things right, or for some reason the daemon on 005 isn't properly 
getting or processing the launch command.

What you could try is adding --display-map to see if the map is being correctly 
generated. If that works, then (using a debug build) try adding 
--leave-session-attached and see if any daemons are outputting an error.

You could add -mca odls_base_verbose 5 --leave-session-attached to your cmd 
line. You'll see debug output from each daemon as it receives and processes the 
launch command. See if the daemon on 005 is behaving differently than the 
others.

You should also try putting that long list of nodes in a hostfile - see if that 
makes a difference. It will process the nodes thru a different code path, so if 
there is some problem in --host, this will tell us.


On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:

> Hello Open MPI volks,
> 
> We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we 
> have some strange hangups if starting OpenMPI processes.
> 
> The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of  
> offline nodes). Each node is accessible from each other over SSH (without 
> password), also MPI programs between any two nodes are checked to run.
> 
> 
> So long, I tried to start some bigger number of processes, one process per 
> node:
> $ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe
> 
> Now the problem: there are some constellations of names in the host list on 
> which mpiexec reproducible hangs forever; and more surprising: other 
> *permutation* of the *same* node names may run without any errors!
> 
> Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. 
> Note: the only difference is that the node linuxbsc025 is put on the end of 
> the host list. Amazed, too?
> 
> Looking on the particular nodes during the above mpiexec hangs, we found the 
> orted daemons started on *each* node and the binary on all but one node 
> (orted.txt, MPI_FastTest.txt).
> Again amazing that the node with no user process started (leading to hangup 
> in MPI_Init of all processes and thus to hangup, I believe) was always the 
> same, linuxbsc005, which is NOT the permuted item linuxbsc025...
> 
> This behaviour is reproducible. The hang-on only occure if the started 
> application is a MPI application ("hostname" does not hang).
> 
> 
> Any Idea what is gonna on?
> 
> 
> Best,
> 
> Paul Kapinos
> 
> 
> P.S: no alias names used, all names are real ones
> 
> 
> 
> 
> 
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe
> linuxbsc002: STDOUT:  2142 ?SLl0:00 MPI_FastTest.exe
> linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe
> linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe
> linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe
> linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe
> linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe
> linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe
> linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe
> linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe
> linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe
> linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe
> linuxbsc017: STDOUT: 54693 ?SLl0:00 MPI_FastTest.exe
> linuxbsc018: STDOUT: 54187 ?SLl0:00 MPI_FastTest.exe
> linuxbsc020: STDOUT: 55811 ?SLl0:00 MPI_FastTest.exe
> linuxbsc021: STDOUT: 54982 ?SLl0:00 MPI_FastTest.exe
> linuxbsc022: STDOUT: 50032 ?SLl0:00 MPI_FastTest.exe
> linuxbsc023: STDOUT: 54044 ?SLl0:00 MPI_FastTest.exe
> linuxbsc024: STDOUT: 51247 ?SLl0:00 MPI_FastTest.exe
> linuxbsc025: STDOUT: 18575 ?SLl0:00 MPI_FastTest.exe
> linuxbsc027: STDOUT: 48969 ?SLl0:00 MPI_FastTest.exe
> linuxbsc028: STDOUT: 52397 ?SLl0:00 MPI_FastTest.exe
> linuxbsc029: STDOUT: 52780 ?SLl0:00 MPI_FastTest.exe
> linuxbsc030: STDOUT: 47537 ?SLl0:00 MPI_FastTest.exe
> linuxbsc031: STDOUT: 54609 ?SLl0:00 MPI_FastTest.exe
> linuxbsc032: STDOUT: 52833 ?SLl0:00 MPI_FastTest.exe
> $ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27  --host 
> 

[OMPI users] How are the Open MPI processes spawned?

2011-11-21 Thread Paul Kapinos

Hello Open MPI volks,

We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, 
and we have some strange hangups if starting OpenMPI processes.


The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of 
 offline nodes). Each node is accessible from each other over SSH 
(without password), also MPI programs between any two nodes are checked 
to run.



So long, I tried to start some bigger number of processes, one process 
per node:

$ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe

Now the problem: there are some constellations of names in the host list 
on which mpiexec reproducible hangs forever; and more surprising: other 
*permutation* of the *same* node names may run without any errors!


Example: the command in laueft.txt runs OK, the command in haengt.txt 
hangs. Note: the only difference is that the node linuxbsc025 is put on 
the end of the host list. Amazed, too?


Looking on the particular nodes during the above mpiexec hangs, we found 
the orted daemons started on *each* node and the binary on all but one 
node (orted.txt, MPI_FastTest.txt).
Again amazing that the node with no user process started (leading to 
hangup in MPI_Init of all processes and thus to hangup, I believe) was 
always the same, linuxbsc005, which is NOT the permuted item linuxbsc025...


This behaviour is reproducible. The hang-on only occure if the started 
application is a MPI application ("hostname" does not hang).



Any Idea what is gonna on?


Best,

Paul Kapinos


P.S: no alias names used, all names are real ones







--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe
linuxbsc002: STDOUT:  2142 ?SLl0:00 MPI_FastTest.exe
linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe
linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe
linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe
linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe
linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe
linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe
linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe
linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe
linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe
linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe
linuxbsc017: STDOUT: 54693 ?SLl0:00 MPI_FastTest.exe
linuxbsc018: STDOUT: 54187 ?SLl0:00 MPI_FastTest.exe
linuxbsc020: STDOUT: 55811 ?SLl0:00 MPI_FastTest.exe
linuxbsc021: STDOUT: 54982 ?SLl0:00 MPI_FastTest.exe
linuxbsc022: STDOUT: 50032 ?SLl0:00 MPI_FastTest.exe
linuxbsc023: STDOUT: 54044 ?SLl0:00 MPI_FastTest.exe
linuxbsc024: STDOUT: 51247 ?SLl0:00 MPI_FastTest.exe
linuxbsc025: STDOUT: 18575 ?SLl0:00 MPI_FastTest.exe
linuxbsc027: STDOUT: 48969 ?SLl0:00 MPI_FastTest.exe
linuxbsc028: STDOUT: 52397 ?SLl0:00 MPI_FastTest.exe
linuxbsc029: STDOUT: 52780 ?SLl0:00 MPI_FastTest.exe
linuxbsc030: STDOUT: 47537 ?SLl0:00 MPI_FastTest.exe
linuxbsc031: STDOUT: 54609 ?SLl0:00 MPI_FastTest.exe
linuxbsc032: STDOUT: 52833 ?SLl0:00 MPI_FastTest.exe
$ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27  --host 
linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc025,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032
 
MPI_FastTest.exe
$ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27  --host 
linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032,linuxbsc025
 
MPI_FastTest.exe
linuxbsc001: STDOUT: 24322 ?Ss 0:00 
/opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca 
orte_ess_jobid 751435776 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 28 
--hnp-uri 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh
linuxbsc002: STDOUT:  2141 ?Ss 0:00 
/opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca 
orte_ess_jobid 751435776 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 28 
--hnp-uri 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh
linuxbsc003: STDOUT: 69265 ?Ss 0:00 
/opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca 
orte_ess_jobid 751435776 -mca orte_ess_vpid 3 -mca