Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove
That's probably a reflection of the status of the "Open MPI User 
Documentation" sub-project :-)


On 2/10/2012 5:12 PM, Jeff Squyres wrote:

FWIW: google analytics indicates that the FAQ and the mailing list archives are 
among the most heavily used sections of the web site.  :-)


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Jeff Squyres
FWIW: google analytics indicates that the FAQ and the mailing list archives are 
among the most heavily used sections of the web site.  :-)

On Feb 10, 2012, at 8:09 PM, Paul H. Hargrove wrote:

> Much better - at least to the extent that users actually read FAQs :-)
> -Paul
> 
> On 2/10/2012 5:01 PM, Jeff Squyres (jsquyres) wrote:
>> Check out #220 now; I updated it.
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove

Much better - at least to the extent that users actually read FAQs :-)
-Paul

On 2/10/2012 5:01 PM, Jeff Squyres (jsquyres) wrote:

Check out #220 now; I updated it.


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Jeff Squyres (jsquyres)
Check out #220 now; I updated it. 

Sent from my phone. No type good. 

On Feb 10, 2012, at 4:46 PM, "Jeff Squyres"  wrote:

> On Feb 10, 2012, at 3:32 PM, Paul H. Hargrove wrote:
> 
>> The point of the question isn't related to WHY eth8 is useless - just assume 
>> it is.
>> Assume it is UP, but useless for whatever reasons motivated writing FAQ #220.
>> It could be Terry's example of a port connected to the service processor.
>> 
>> The concern is what happens in this situation when the user, following the 
>> advice in the FAQ, passes an explicit setting for btl_tcp_if_exclude, which 
>> DOES NOT include virbr0?
>> They don't know it was there before, or that it needs to be there (the FAQ 
>> states that lo MUST be included).
>> So, by following the FAQ they don't resolve their problem.
>> OMPI ceases any attempts use of eth8 (or whatever), but loss of the implicit 
>> virbr0 from the exclude list results in their system attempting to use 
>> virbr0 (and thus continue to fail).  Right?
>> 
>> Maybe the FAQ just needs an update to address my concern.
> 
> Got it.  Sure, I can update the faq to be a bit more loose in the definition 
> of what must be excluded.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Jeff Squyres
On Feb 10, 2012, at 3:32 PM, Paul H. Hargrove wrote:

> The point of the question isn't related to WHY eth8 is useless - just assume 
> it is.
> Assume it is UP, but useless for whatever reasons motivated writing FAQ #220.
> It could be Terry's example of a port connected to the service processor.
> 
> The concern is what happens in this situation when the user, following the 
> advice in the FAQ, passes an explicit setting for btl_tcp_if_exclude, which 
> DOES NOT include virbr0?
> They don't know it was there before, or that it needs to be there (the FAQ 
> states that lo MUST be included).
> So, by following the FAQ they don't resolve their problem.
> OMPI ceases any attempts use of eth8 (or whatever), but loss of the implicit 
> virbr0 from the exclude list results in their system attempting to use virbr0 
> (and thus continue to fail).  Right?
> 
> Maybe the FAQ just needs an update to address my concern.

Got it.  Sure, I can update the faq to be a bit more loose in the definition of 
what must be excluded.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 12:21 PM, Jeff Squyres wrote:

On Feb 10, 2012, at 3:14 PM, Paul H. Hargrove wrote:


+ User knows nothing about xen, and thus nothing about virbr0
+ User has a local-only interface (eth8 in my made up example)
+ User reads FAQ entry "220. How do I tell Open MPI which TCP networks to use?"
+ User follows instructions given in said FAQ, yielding my example command line.

Do you mean that eth8 is the only non-loopback interface on their laptop, and 
it's disconnected?  (e.g., sitting on a train with no wifi and no wired 
ethernet)

Then OMPI would have disqualified that interface, anyway (because it wasn't up).

I think I'm missing the zen of your question... :-\



The point of the question isn't related to WHY eth8 is useless - just 
assume it is.
Assume it is UP, but useless for whatever reasons motivated writing FAQ 
#220.

It could be Terry's example of a port connected to the service processor.

The concern is what happens in this situation when the user, following 
the advice in the FAQ, passes an explicit setting for 
btl_tcp_if_exclude, which DOES NOT include virbr0?
They don't know it was there before, or that it needs to be there (the 
FAQ states that lo MUST be included).

So, by following the FAQ they don't resolve their problem.
OMPI ceases any attempts use of eth8 (or whatever), but loss of the 
implicit virbr0 from the exclude list results in their system attempting 
to use virbr0 (and thus continue to fail).  Right?


Maybe the FAQ just needs an update to address my concern.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Jeff Squyres
On Feb 10, 2012, at 3:14 PM, Paul H. Hargrove wrote:

> + User knows nothing about xen, and thus nothing about virbr0
> + User has a local-only interface (eth8 in my made up example)
> + User reads FAQ entry "220. How do I tell Open MPI which TCP networks to 
> use?"
> + User follows instructions given in said FAQ, yielding my example command 
> line.

Do you mean that eth8 is the only non-loopback interface on their laptop, and 
it's disconnected?  (e.g., sitting on a train with no wifi and no wired 
ethernet)

Then OMPI would have disqualified that interface, anyway (because it wasn't up).

I think I'm missing the zen of your question... :-\

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 12:03 PM, Jeff Squyres wrote:

On Feb 10, 2012, at 1:44 PM, Paul H. Hargrove wrote:


Since the situation described is one where the user didn't know they 
could/should disable xen, it is reasonable to think they ALSO don't know they 
need to exclude virbr0.

That's what I'm thinking.


So, I read the question as meaning the following:
 What happens when a user who doesn't know anything about virbr0 does
  mpirun --mca btl_tcp_if_exclude lo,eth8

I'm not sure I understand your question -- the above will exclude loopback and 
eth8.

(where did eth8 come from?)



Sorry, if I wasn't clear.
I'll try again:

+ User knows nothing about xen, and thus nothing about virbr0
+ User has a local-only interface (eth8 in my made up example)
+ User reads FAQ entry "220. How do I tell Open MPI which TCP networks 
to use?"
+ User follows instructions given in said FAQ, yielding my example 
command line.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Jeff Squyres
On Feb 10, 2012, at 1:44 PM, Paul H. Hargrove wrote:

> Since the situation described is one where the user didn't know they 
> could/should disable xen, it is reasonable to think they ALSO don't know they 
> need to exclude virbr0.  

That's what I'm thinking.

> So, I read the question as meaning the following:
> What happens when a user who doesn't know anything about virbr0 does
>  mpirun --mca btl_tcp_if_exclude lo,eth8

I'm not sure I understand your question -- the above will exclude loopback and 
eth8.  

(where did eth8 come from?)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 10:38 AM, Jeff Squyres wrote:

On Feb 10, 2012, at 1:00 PM, TERRY DONTJE wrote:


>>  Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?
>>  

>  What happens to that value if you then set btl_tcp_if_exclude to some value 
on the mpirun command line?

It works just fine.  I.e., if you

 mpirun --mca btl_tcp_if_exclude lo,virbr0 ...

That works like a champ.


Since the situation described is one where the user didn't know they 
could/should disable xen, it is reasonable to think they ALSO don't know 
they need to exclude virbr0.  So, I read the question as meaning the 
following:

 What happens when a user who doesn't know anything about virbr0 does
  mpirun --mca btl_tcp_if_exclude lo,eth8
And my guess is "nothing good happens".

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Jeff Squyres
On Feb 10, 2012, at 1:00 PM, TERRY DONTJE wrote:

>> Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?  
>> 
> What happens to that value if you then set btl_tcp_if_exclude to some value 
> on the mpirun command line?  

It works just fine.  I.e., if you

mpirun --mca btl_tcp_if_exclude lo,virbr0 ...

That works like a champ.

But per Ralph's question, I don't know how generic that name is.  It *seems* to 
be specific to a virtualization interface (I assume "virbr" = "virtual 
bridge"), but I can't say that for sure.

> So this brings me to something that has annoyed me for a bit.  It seems to me 
> that maybe it would be nice to have a conf file that you can dump interface 
> names to exclude but would not be interpreted as a btl_tcp_if_exclude 
> options.  For example there were some interfaces on certain Sun machine (a 
> long time ago) that went to the diagnostic processor and caused a similar 
> issue as the virbr0 issue.  So we started delivering a conf file that set 
> btl_tcp_if_exclude but then this precluded anyone from being able to set 
> btl_tcp_if_include.  If we had a file one could specify the set of interfaces 
> to use or exclude but allow the user to operate on the result of that set it 
> seems that would be nice.

I'm not sure what you're saying.  CLI always overrides config files...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread TERRY DONTJE



On 2/10/2012 11:50 AM, Jeff Squyres wrote:

This is an open question to OMPI developers...

It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when Xen 
is activated.  This IP interface is only used to communicate with the local Xen 
instance(s); it is not used to communicate over the real network.

In a case that I saw, the interface is created, set to "up", and is given an IP address 
in the 192.168.1.x range.  This was done by default -- all the user had done was either say 
"yes, I want Xen enabled", or he didn't say he wanted it *disabled* (I'm not sure which).
I've done the latter and hit the same problem.  There were instructions 
somewhere on the web that I found that told one how to disable vibr0.


This causes a problem if you have Xen enabled on multiple machines in an OMPI job.  OMPI 
will see the 192.168.1.x address and see that it's "up", so it'll add it to the 
eligible subnets that can be used.  When OMPI sees that its peer processes also have 
192.168.1.x, it'll try to use that network for OOB/BTL traffic -- which will fail, 
because these are local-only interfaces.

Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?
What happens to that value if you then set btl_tcp_if_exclude to some 
value on the mpirun command line?  So this brings me to something that 
has annoyed me for a bit.  It seems to me that maybe it would be nice to 
have a conf file that you can dump interface names to exclude but would 
not be interpreted as a btl_tcp_if_exclude options.  For example there 
were some interfaces on certain Sun machine (a long time ago) that went 
to the diagnostic processor and caused a similar issue as the virbr0 
issue.  So we started delivering a conf file that set btl_tcp_if_exclude 
but then this precluded anyone from being able to set 
btl_tcp_if_include.  If we had a file one could specify the set of 
interfaces to use or exclude but allow the user to operate on the result 
of that set it seems that would be nice.


--td


Or is there another way to detect that an interface is local-only and should 
not be used for OOB/BTL communication?

See this post on the user's list:

 http://www.open-mpi.org/community/lists/users/2012/02/18432.php



--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Ralph Castain

On Feb 10, 2012, at 9:50 AM, Jeff Squyres wrote:

> This is an open question to OMPI developers...
> 
> It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when 
> Xen is activated.  This IP interface is only used to communicate with the 
> local Xen instance(s); it is not used to communicate over the real network.  
> 
> In a case that I saw, the interface is created, set to "up", and is given an 
> IP address in the 192.168.1.x range.  This was done by default -- all the 
> user had done was either say "yes, I want Xen enabled", or he didn't say he 
> wanted it *disabled* (I'm not sure which).
> 
> This causes a problem if you have Xen enabled on multiple machines in an OMPI 
> job.  OMPI will see the 192.168.1.x address and see that it's "up", so it'll 
> add it to the eligible subnets that can be used.  When OMPI sees that its 
> peer processes also have 192.168.1.x, it'll try to use that network for 
> OOB/BTL traffic -- which will fail, because these are local-only interfaces.
> 
> Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?  

How generic is that name? I've looked and can't find a way to detect a 
local-only interface, though you might be able to do it via ARP. Looking for a 
name, though, is pretty hit/miss.

> 
> Or is there another way to detect that an interface is local-only and should 
> not be used for OOB/BTL communication?
> 
> See this post on the user's list:
> 
>http://www.open-mpi.org/community/lists/users/2012/02/18432.php
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Jeff Squyres
This is an open question to OMPI developers...

It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when Xen 
is activated.  This IP interface is only used to communicate with the local Xen 
instance(s); it is not used to communicate over the real network.  

In a case that I saw, the interface is created, set to "up", and is given an IP 
address in the 192.168.1.x range.  This was done by default -- all the user had 
done was either say "yes, I want Xen enabled", or he didn't say he wanted it 
*disabled* (I'm not sure which).

This causes a problem if you have Xen enabled on multiple machines in an OMPI 
job.  OMPI will see the 192.168.1.x address and see that it's "up", so it'll 
add it to the eligible subnets that can be used.  When OMPI sees that its peer 
processes also have 192.168.1.x, it'll try to use that network for OOB/BTL 
traffic -- which will fail, because these are local-only interfaces.

Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?  

Or is there another way to detect that an interface is local-only and should 
not be used for OOB/BTL communication?

See this post on the user's list:

http://www.open-mpi.org/community/lists/users/2012/02/18432.php

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/