Re: [OMPI users] Bug in oob_tcp_[in|ex]clude?

2007-12-18 Thread Jeff Squyres

On Dec 18, 2007, at 11:12 AM, Marco Sbrighi wrote:


Assumedly this(these) statement(s) are in a config file that is being
read by Open MPI, such as $HOME/.openmpi/mca-params.conf?


I've tried many combinations: only in $HOME/.openmpi/mca-params.conf,
only in command line and both; but none seems to work correctly.
Nevertheless, what I'm expecting is that if something is specified in
$HOME/.openmpi/mca-params.conf, then if differently specified in  
command

line, the last should be assumed, I think.


The only difference in putting values in these locations should be the  
order of precedence in which they are read.  As you stated, values on  
the command line override everything else.  See http://www.open-mpi.org/faq/?category=tuning#setting-mca-params 
. 

Yes, it does.  Specifying the MCA same param twice on the command  
line

results in undefined behavior -- it will only take one of them, and I
assume it'll take the first (but I'd have to check the code to be  
sure).


OK, I can obtain the same behaviour using only one statement:
--mca oob_tcp_include eth1,lo,eth0,ib0,ib1


FWIW, I traced the history of this code -- it looks like it dates all  
the way back to LAM/MPI, where if you specify "--mca foo bar --mca foo  
yow", then foo will get the value "bar,yow".  So it *is* intended  
(albeit undocumented!) behavior.  Who knew!  :-)


note that using  --mca mpi_show_mca_params what I'm seeing in the  
report

is the same for both statements (twice and single):

.
[node255:30188] oob_tcp_debug=0
[node255:30188] oob_tcp_include=eth1,lo,eth0,ib0,ib1
[node255:30188] oob_tcp_exclude=
...


So far, this is all consistent and expected.

Could you try with 1.2.3 or 1.2.4 (1.2.4 is the most recent; 1.2.5  
is

due out "soon" -- it *may* get out before the holiday break, but no
promises...)?


we have 1.2.3 in another cluster and it performs the same behaviour as
1.2.2  (BTW the other cluster has the same eth ifaces)


Crud.


If you can't upgrade, let me know and I can provide a debugging patch
that will give us a little more insight into what is happening on  
your

machines.  Thanks.


It is quite difficult for us to upgrade the open-mpi now. We have the
official CISCO packages installed, and I know the 1.2.2-1 is the only
official CISCO's open-mpi distribution today 



Here's a patch to the OMPI 1.2.2 source that adds some printf's in the  
OOB TCP interface selection logic that should show exactly what each  
process decides.  You should be able to run this with as few as 2  
processes to see what the decision-making process is for each of them.


11:24] svbu-mpi:/home/jsquyres/openmpi-1.2.2 % diff -u orte/mca/oob/ 
tcp/oob_tcp.c.orig orte/mca/oob/tcp/oob_tcp.c

--- orte/mca/oob/tcp/oob_tcp.c.orig 2007-12-18 11:21:08.0 -0800
+++ orte/mca/oob/tcp/oob_tcp.c  2007-12-18 11:22:29.0 -0800
@@ -1344,11 +1344,15 @@
 char name[32];
 opal_ifindextoname(i, name, sizeof(name));
 if (mca_oob_tcp_component.tcp_include != NULL &&
-strstr(mca_oob_tcp_component.tcp_include,name) == NULL)
+strstr(mca_oob_tcp_component.tcp_include,name) == NULL) {
+opal_output(0, "TCP OOB skipping %s because it's not in  
include (%s)\n", name, mca_oob_tcp_component.tcp_include);

 continue;
+}
 if (mca_oob_tcp_component.tcp_exclude != NULL &&
-strstr(mca_oob_tcp_component.tcp_exclude,name) != NULL)
+strstr(mca_oob_tcp_component.tcp_exclude,name) != NULL) {
+opal_output(0, "TCP OOB skipping %s because it's in  
exclude (%s)\n", name, mca_oob_tcp_component.tcp_exclude);

 continue;
+}
 opal_ifindextoaddr(i, (struct sockaddr*), sizeof(addr));
 if(opal_ifcount() > 1 &&
opal_ifislocalhost((struct sockaddr*) ))
@@ -1356,6 +1360,7 @@
 if(ptr != contact_info) {
 ptr += sprintf(ptr, ";");
 }
+opal_output(0, "TCP OOB adding interface: %s\n", name);
 ptr += sprintf(ptr, "tcp://%s:%d", inet_ntoa(addr.sin_addr),
 ntohs(mca_oob_tcp_component.tcp_listen_port));
 }

I attached the patch as well in case my mail client / the mailing list  
munges it.


--
Jeff Squyres
Cisco Systems



ompi-1.2.2-oob-tcp-verbose.patch
Description: Binary data




Re: [OMPI users] Bug in oob_tcp_[in|ex]clude?

2007-12-18 Thread Marco Sbrighi
On Mon, 2007-12-17 at 20:58 -0500, Brian Dobbins wrote:
> Hi Marco and Jeff,
> 
>   My own knowledge of OpenMPI's internals is limited, but I thought
> I'd add my less-than-two-cents...
> 
> > I've found only a way in order to have tcp connections
> binded only to
> > the eth1 interface, using both the following MCA directives
> in the
> > command line:
> >
> > mpirun  --mca oob_tcp_include eth1 --mca
> oob_tcp_include 
> > lo,eth0,ib0,ib1 .
> >
> > This sounds me as bug.
> 
> 
> Yes, it does.  Specifying the MCA same param twice on the
> command line
> results in undefined behavior -- it will only take one of
> them, and I 
> assume it'll take the first (but I'd have to check the code to
> be sure).
> 
>   I think that Marco intended to write:
>   mpirun  --mca oob_tcp_include eth1 --mca oob_tcp_exclude
> lo,eth0,ib0,ib1 ... 

no, I intended to write exactly what I wrote. The double statement is
reported by --mca mpi_show_mca_params exactly as I write one statement
only, as follows:

--mca oob_tcp_include eth1,lo,eth0,ib0,ib1

> 
>   Is this correct?  So you're not specifying include twice, you're
> specifying include and exclude, so each interface is explicitly stated
> in one list or the other.  I remember encountering this behaviour as
> well, in a slightly different format, but I can't seem to reproduce it
> now either. 

notice, the two lists are never intersecting.

>  That said, with these options, won't the MPI traffic (as opposed to
> the OOB traffic) still use the eth1,ib0 and ib1 interfaces?  You'd
> need to add '-mca btl_tcp_include eth1' in order to say it should only
> go over that NIC, I think. 

Yes I know, in fact -mca btl_tcp_[if]_exclude lo,eth0,ib0,ib1
works fine (seems). I'm using this MCA parameter since open-mpi 1.2.1
and the trouble with oob_tcp_[if]_[in|ex]clude sounded quite strange to
me, after all the code used for the parser should be more or less the
same . 

> 
>   As for the 'connection errors', two bizarre things to check are,
> first, that all of your nodes using eth1 actually have
> correct /etc/hosts mappings to the other nodes.  One system I ran on
> had this problem when some nodes had an IP address for node002 as one
> thing, and another node had node002's IP address as something
> different.   This should be easy enough by trying to run on one node
> first, then two nodes that you're sure have the correct addresses. 

Yes, I've already verified that. 

> 
>   .. The second situation is if you're launching an MPMD program.
> Here, you need to use '-gmca ' instead of '-mca '.
> 

No, currently I'm using only SPMD ones, and I hope to use them for the
rest of the century :-)

>   Hope some of that is at least a tad useful.  :) 
> 

Thanks you very much Brian,

Marco 

>   Cheers,
>   - Brian
> 
-- 
-
 Marco Sbrighi  m.sbri...@cineca.it

 HPC Group
 CINECA Interuniversity Computing Centre
 via Magnanelli, 6/3
 40033 Casalecchio di Reno (Bo) ITALY
 tel. 051 6171516



Re: [OMPI users] Bug in oob_tcp_[in|ex]clude?

2007-12-18 Thread Marco Sbrighi
On Mon, 2007-12-17 at 17:19 -0500, Jeff Squyres wrote:
> On Dec 17, 2007, at 8:35 AM, Marco Sbrighi wrote:
> 
> > I'm using Open MPI 1.2.2 over OFED 1.2 on an 256 nodes, dual Opteron,
> > dual core, Linux cluster. Of course, with Infiniband 4x interconnect.
> >
> > Each cluster node is equipped with 4 (or more) ethernet interface,
> > namely 2 gigabit ones plus 2 IPoIB. The two gig are named  eth0,eth1,
> > while the two IPoIB are named ib0,ib1.
> >
> > It happens that the eth0 is a management network, with poor
> > performances, and furthermore we wouldn't use the ib* to carry MPI's
> > traffic (neither OOB or TCP), so we would like the eth1 is used for  
> > open
> > MPI OOB and TCP.
> >
> > In order to drive the OOB over only eth1 I've tried various  
> > combinations
> > of oob_tcp_[ex|in]clude MCA statements, starting from the obvious
> >
> > oob_tcp_exclude = lo,eth0,ib0,ib1
> >
> > then trying the othe obvious:
> >
> > oob_tcp_include = eth1
> 
> This one statement (_include) should be sufficient.

I agree with your interpretation, but what I'm experimenting here is "it
should" but in fact it doesn't .

> 
> Assumedly this(these) statement(s) are in a config file that is being  
> read by Open MPI, such as $HOME/.openmpi/mca-params.conf?

I've tried many combinations: only in $HOME/.openmpi/mca-params.conf,
only in command line and both; but none seems to work correctly.
Nevertheless, what I'm expecting is that if something is specified in 
$HOME/.openmpi/mca-params.conf, then if differently specified in command
line, the last should be assumed, I think.
> 
> > and both at the same time.
> >
> > Next I've tried the following:
> >
> > oob_tcp_exclude = eth0
> >
> > but after the job starts, I still have a lot of tcp connections
> > established using eth0 or ib0 or ib1.
> > Furthermore It happens the following error:
> >
> >   [node191:03976] [0,1,14]-[0,1,12] mca_oob_tcp_peer_complete_connect:
> > connection failed: Connection timed out (110) - retrying
> 
> This is quite odd.  :-(
> 
> > I've found only a way in order to have tcp connections binded only to
> > the eth1 interface, using both the following MCA directives in the
> > command line:
> >
> > mpirun  --mca oob_tcp_include eth1 --mca oob_tcp_include  
> > lo,eth0,ib0,ib1 .
> >
> > This sounds me as bug.
> 
> Yes, it does.  Specifying the MCA same param twice on the command line  
> results in undefined behavior -- it will only take one of them, and I  
> assume it'll take the first (but I'd have to check the code to be sure).

OK, I can obtain the same behaviour using only one statement: 
--mca oob_tcp_include eth1,lo,eth0,ib0,ib1

note that using  --mca mpi_show_mca_params what I'm seeing in the report
is the same for both statements (twice and single):

.
 [node255:30188] oob_tcp_debug=0
[node255:30188] oob_tcp_include=eth1,lo,eth0,ib0,ib1
[node255:30188] oob_tcp_exclude=
...


> 
> > Is there someone able to reproduce this behaviour?
> > If this is a bug, are there fixes?
> 
> 
> I'm unfortunately unable to reproduce this behavior.  I have a test  
> cluster with 2 IP interfaces: ib0, eth0.  I have tried several  
> combinations of MCA params with 1.2.2:
> 
> --mca oob_tcp_include ib0
> --mca oob_tcp_include ib0,bogus
> --mca oob_tcp_include eth0
> --mca oob_tcp_include eth0,bogus
> --mca oob_tcp_exclude ib0
> --mca oob_tcp_exclude ib0,bogus
> --mca oob_tcp_exclude eth0
> --mca oob_tcp_exclude eth0,bogus
> 
> All do as they are supposed to -- including or excluding ib0 or eth0.
> 
> I do note, however, that the handling of these parameters changed in  
> 1.2.3 -- as well as their names.  The names changed to  
> "oob_tcp_if_include" and "oob_tcp_if_exclude" to match other MCA  
> parameter name conventions from other components.
> 
> Could you try with 1.2.3 or 1.2.4 (1.2.4 is the most recent; 1.2.5 is  
> due out "soon" -- it *may* get out before the holiday break, but no  
> promises...)?

we have 1.2.3 in another cluster and it performs the same behaviour as
1.2.2  (BTW the other cluster has the same eth ifaces)

> 
> If you can't upgrade, let me know and I can provide a debugging patch  
> that will give us a little more insight into what is happening on your  
> machines.  Thanks.

It is quite difficult for us to upgrade the open-mpi now. We have the
official CISCO packages installed, and I know the 1.2.2-1 is the only
official CISCO's open-mpi distribution today 

In any case I would like to try your debug patch.

Thanks

Marco 

> 
-- 
-
 Marco Sbrighi  m.sbri...@cineca.it

 HPC Group
 CINECA Interuniversity Computing Centre
 via Magnanelli, 6/3
 40033 Casalecchio di Reno (Bo) ITALY
 tel. 051 6171516



Re: [OMPI users] Bug in oob_tcp_[in|ex]clude?

2007-12-17 Thread Brian Dobbins
Hi Marco and Jeff,

  My own knowledge of OpenMPI's internals is limited, but I thought I'd add
my less-than-two-cents...

> I've found only a way in order to have tcp connections binded only to
> > the eth1 interface, using both the following MCA directives in the
> > command line:
> >
> > mpirun  --mca oob_tcp_include eth1 --mca oob_tcp_include
> > lo,eth0,ib0,ib1 .
> >
> > This sounds me as bug.
>
> Yes, it does.  Specifying the MCA same param twice on the command line
> results in undefined behavior -- it will only take one of them, and I
> assume it'll take the first (but I'd have to check the code to be sure).


  I *think* that Marco intended to write:
  mpirun  --mca oob_tcp_include eth1 --mca oob_tcp_exclude
lo,eth0,ib0,ib1 ...

  Is this correct?  So you're not specifying include twice, you're
specifying include *and* exclude, so each interface is explicitly stated in
one list or the other.  I remember encountering this behaviour as well, in a
slightly different format, but I can't seem to reproduce it now either.
That said, with these options, won't the MPI traffic (as opposed to the OOB
traffic) still use the eth1,ib0 and ib1 interfaces?  You'd need to add '-mca
btl_tcp_include eth1' in order to say it should only go over that NIC, I
think.

  As for the 'connection errors', two bizarre things to check are, first,
that all of your nodes using eth1 actually have correct /etc/hosts mappings
to the other nodes.  One system I ran on had this problem when some nodes
had an IP address for node002 as one thing, and another node had node002's
IP address as something different.   This should be easy enough by trying to
run on one node first, then two nodes that you're sure have the correct
addresses.

  .. The second situation is if you're launching an MPMD program.  Here, you
need to use '-gmca ' instead of '-mca '.

  Hope some of that is at least a tad useful.  :)

  Cheers,
  - Brian


Re: [OMPI users] Bug in oob_tcp_[in|ex]clude?

2007-12-17 Thread Jeff Squyres

On Dec 17, 2007, at 8:35 AM, Marco Sbrighi wrote:


I'm using Open MPI 1.2.2 over OFED 1.2 on an 256 nodes, dual Opteron,
dual core, Linux cluster. Of course, with Infiniband 4x interconnect.

Each cluster node is equipped with 4 (or more) ethernet interface,
namely 2 gigabit ones plus 2 IPoIB. The two gig are named  eth0,eth1,
while the two IPoIB are named ib0,ib1.

It happens that the eth0 is a management network, with poor
performances, and furthermore we wouldn't use the ib* to carry MPI's
traffic (neither OOB or TCP), so we would like the eth1 is used for  
open

MPI OOB and TCP.

In order to drive the OOB over only eth1 I've tried various  
combinations

of oob_tcp_[ex|in]clude MCA statements, starting from the obvious

oob_tcp_exclude = lo,eth0,ib0,ib1

then trying the othe obvious:

oob_tcp_include = eth1


This one statement (_include) should be sufficient.

Assumedly this(these) statement(s) are in a config file that is being  
read by Open MPI, such as $HOME/.openmpi/mca-params.conf?



and both at the same time.

Next I've tried the following:

oob_tcp_exclude = eth0

but after the job starts, I still have a lot of tcp connections
established using eth0 or ib0 or ib1.
Furthermore It happens the following error:

  [node191:03976] [0,1,14]-[0,1,12] mca_oob_tcp_peer_complete_connect:
connection failed: Connection timed out (110) - retrying


This is quite odd.  :-(


I've found only a way in order to have tcp connections binded only to
the eth1 interface, using both the following MCA directives in the
command line:

mpirun  --mca oob_tcp_include eth1 --mca oob_tcp_include  
lo,eth0,ib0,ib1 .


This sounds me as bug.


Yes, it does.  Specifying the MCA same param twice on the command line  
results in undefined behavior -- it will only take one of them, and I  
assume it'll take the first (but I'd have to check the code to be sure).



Is there someone able to reproduce this behaviour?
If this is a bug, are there fixes?



I'm unfortunately unable to reproduce this behavior.  I have a test  
cluster with 2 IP interfaces: ib0, eth0.  I have tried several  
combinations of MCA params with 1.2.2:


   --mca oob_tcp_include ib0
   --mca oob_tcp_include ib0,bogus
   --mca oob_tcp_include eth0
   --mca oob_tcp_include eth0,bogus
   --mca oob_tcp_exclude ib0
   --mca oob_tcp_exclude ib0,bogus
   --mca oob_tcp_exclude eth0
   --mca oob_tcp_exclude eth0,bogus

All do as they are supposed to -- including or excluding ib0 or eth0.

I do note, however, that the handling of these parameters changed in  
1.2.3 -- as well as their names.  The names changed to  
"oob_tcp_if_include" and "oob_tcp_if_exclude" to match other MCA  
parameter name conventions from other components.


Could you try with 1.2.3 or 1.2.4 (1.2.4 is the most recent; 1.2.5 is  
due out "soon" -- it *may* get out before the holiday break, but no  
promises...)?


If you can't upgrade, let me know and I can provide a debugging patch  
that will give us a little more insight into what is happening on your  
machines.  Thanks.


--
Jeff Squyres
Cisco Systems