Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain



On 9/6/06 9:44 AM, "Christian Kauhaus"  wrote:

> Bogdan Costescu :
>> I don't know why you think that this (talking to different nodes via
>> different channels) is unusual - I think that it's quite probable,
>> especially in a heterogenous environment.
> 
> I think the first goal should be to get IPv6 working -- and this is much
> more easier when we restrict ourselves to the case when all system
> participating in one(!) job are reachable via a single protocol version,
> either IPv4 or IPv6.
> 
> I'm not quite sure if we need to run a *single* job across a network
> with both systems that are not reachable via IPv4 and systems
> that are not reachable via IPv6. If there is a practical need for this,
> we will probably tackle this in the future. Note that the current plan
> does not restrict the use of OpenMPI in heterogenous IPv4/IPv6
> environments, but we will not support mixed IPv4/IPv6 operation in a
> single job right now.
> 
> Our current plan is to look into the hostfile and see if there are
> 
> (1a) just IPv4 addresses
> (1b) IPv4 addresses and hostnames for which 'A' queries can be resolved
> (2a) just IPv6 addresses
> (2b) IPv6 addresses and hostnames for which '' queries can be resolved.
> 
> In case 1 we initially use an IPv4 transport and in case 2 we initially
> use an IPv6 transport for the oob. If neither case 1 or 2 are possible,
> we abort. 
> 

Actually, that could cause us considerable problem. Only a subset of OpenRTE
and OpenMPI users actually have hostfiles - the majority do not. Hence, if
we base the IPv6 operation on what is in a hostfile we will be in trouble.

I believe we are going to have to use the "select" mechanism of the OOB
and/or the RML frameworks to let us know which protocol to use when talking
to a specific host.

I also believe you cannot assume that this choice will be consistent for all
processes involved in a job. For example, the head node process must talk to
the external network, which may well be IPv6. However, the nodes *inside*
the cluster may well be IPv4 since they could likely be sitting on a NAT.
The HNP still needs to talk to those nodes as well as the external network.

I don't believe that letting both modes co-exist is all that much harder a
problem to solve. We have similar situations elsewhere in the code base and
have found that the framework mechanism works very well in this situation.

I need to answer Adrian's note anyway and will describe there how to handle
multiple component operations.

> I hope that all can agree that this is a good starting point.
> 
> Regards
>   Christian




Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Christian Kauhaus
Bogdan Costescu :
>I don't know why you think that this (talking to different nodes via 
>different channels) is unusual - I think that it's quite probable, 
>especially in a heterogenous environment.

I think the first goal should be to get IPv6 working -- and this is much
more easier when we restrict ourselves to the case when all system
participating in one(!) job are reachable via a single protocol version,
either IPv4 or IPv6. 

I'm not quite sure if we need to run a *single* job across a network
with both systems that are not reachable via IPv4 and systems
that are not reachable via IPv6. If there is a practical need for this,
we will probably tackle this in the future. Note that the current plan
does not restrict the use of OpenMPI in heterogenous IPv4/IPv6
environments, but we will not support mixed IPv4/IPv6 operation in a
single job right now. 

Our current plan is to look into the hostfile and see if there are 

(1a) just IPv4 addresses
(1b) IPv4 addresses and hostnames for which 'A' queries can be resolved
(2a) just IPv6 addresses
(2b) IPv6 addresses and hostnames for which '' queries can be resolved.

In case 1 we initially use an IPv4 transport and in case 2 we initially
use an IPv6 transport for the oob. If neither case 1 or 2 are possible,
we abort. 

I hope that all can agree that this is a good starting point. 

Regards
  Christian

-- 
Dipl.-Inf. Christian Kauhaus   <><
Lehrstuhl fuer Rechnerarchitektur und -kommunikation 
Institut fuer Informatik * Ernst-Abbe-Platz 1-2 * D-07743 Jena
Tel: +49 3641 9 46376  *  Fax: +49 3641 9 46372   *  Raum 3217


Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
Actually, I was a part of that thread - see my comments beginning with
http://www.open-mpi.org/community/lists/devel/2006/03/0797.php.

Perhaps I communicated poorly here. The issue in the prior thread was that
few systems nowadays don't offer at least some level of IPv6 compatibility,
even if nothing more than mapping IPv6 addresses to IPv4. My point in that
thread was that some types of systems (e.g., embedded systems) don't - they
have no ability to interact with IPv6 at all - but that these are not
commonly found in the high performance world (the focus of OpenMPI).

Although I expect hetero operations to be fairly common, I don't expect to
see too many high performance systems that have no library support at all
for IPv6.

Hope that clarifies my comment. The intent is to fully support both types of
systems anyway, so I'll concede that the point (i.e., how unusual the
situation might be) is somewhat moot.


On 9/6/06 8:13 AM, "Bogdan Costescu" 
wrote:

> On Fri, 1 Sep 2006, Ralph Castain wrote:
> 
>> The only use case I am really concerned about is that of a Head Node
>> Process (HNP) that needs to talk to both IPv6 and IPv4 systems. I
>> admit this will be unusual,
> 
> This and other aspects were discussed or at least mentioned in a
> thread starting at:
> 
> http://www.open-mpi.org/community/lists/devel/2006/03/0781.php
> 
> I don't know why you think that this (talking to different nodes via
> different channels) is unusual - I think that it's quite probable,
> especially in a heterogenous environment.
> 
> However, if the present discussion is only about a proof of concept
> version, then I'd say that anything to show IPv6 functionality would
> be acceptable.




Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

2006-09-06 Thread Gleb Natapov
This error is usually happens when libibverbs is dlopened without
RTLD_GLOBAL flag.

On Wed, Sep 06, 2006 at 03:05:39PM +0200, Ralf Wildenhues wrote:
> Hello,
> 
> * Open MPI wrote on Wed, Sep 06, 2006 at 01:00:00PM CEST:
> > #334: Building with Libtool 2.1a fails to run OpenIB BTL
> 
> >Are you testing uninstalled or installed programs/libraries?
> > 
> >  Installed.
> > 
> >If you are testing an uninstalled program: does libtool generate a shell
> >  wrapper for it?  If yes: post the two shell wrappers generated by 1.5.22
> >  and HEAD.
> > 
> >  I was testing with a trivial "hello world" MPI application; i.e., one that
> >  I had compiled with mpicc and was running with "mpirun -np 2 --mca btl
> >  openib hello".  Hence, I was testing against installed trees of Open MPI.
> >  I took care to "rm -rf" the installation tree before testing each so that
> >  there would be no kruft left from prior installs.
> 
> OK, I can only assume that with 1.5.22, some code links against
> libibverbs, or loads it earlier at runtime, so that the symbol is
> present.  In any case I wonder why mthca.so isn't linked directly
> against libibverbs (maybe useful to suggest that to upstream).
> 
> To find out the culprit, here's a couple of quick (well) ways:
>   diff build-with-lt1.5/Makefile build-with-lt2.1a/Makefile
> 
> and look for differences in library link variables.  Otherwise, the
> config.log outputs should give clues.
> 
> To give you some more hints for possible causes: ompi_info could have a
> different set of RPATH entries, or different NEEDED libraries than your
> test executable; if any of those cause libibverbs.so to be loaded, then
> the symbol would be visible already.  Maybe your test executables even
> have different RPATHs or NEEDED libs (find out with 'objdump -p' and
> ldd)?
> 
> >  I've attached 2 tarballs to the bug (you have to go to the URL of the bug
> >  to get them; they are not included in the mails):
> 
> If there are tarballs available at
> http://svn.open-mpi.org/trac/ompi/ticket/334, then I'm too blind to find
> them.  Would that be elsewhere?
> 
> >   * One with all the configure output and the wrapper script for ompi_info.
> >  Note that ompi_info -- which lt_dlopen()'s the OpenIB BTL -- does not show
> >  the same problem (i.e., the OpenIB BTL opens properly and ompi_info shows
> >  its information).  This happens with both the uninstalled and installed
> >  ompi_info.
> >   * Another with the same for 2.1a.
> 
> That would be very helpful.
> 
> Cheers,
> Ralf
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.


Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

2006-09-06 Thread Ralf Wildenhues
Hello,

* Open MPI wrote on Wed, Sep 06, 2006 at 01:00:00PM CEST:
> #334: Building with Libtool 2.1a fails to run OpenIB BTL

>Are you testing uninstalled or installed programs/libraries?
> 
>  Installed.
> 
>If you are testing an uninstalled program: does libtool generate a shell
>  wrapper for it?  If yes: post the two shell wrappers generated by 1.5.22
>  and HEAD.
> 
>  I was testing with a trivial "hello world" MPI application; i.e., one that
>  I had compiled with mpicc and was running with "mpirun -np 2 --mca btl
>  openib hello".  Hence, I was testing against installed trees of Open MPI.
>  I took care to "rm -rf" the installation tree before testing each so that
>  there would be no kruft left from prior installs.

OK, I can only assume that with 1.5.22, some code links against
libibverbs, or loads it earlier at runtime, so that the symbol is
present.  In any case I wonder why mthca.so isn't linked directly
against libibverbs (maybe useful to suggest that to upstream).

To find out the culprit, here's a couple of quick (well) ways:
  diff build-with-lt1.5/Makefile build-with-lt2.1a/Makefile

and look for differences in library link variables.  Otherwise, the
config.log outputs should give clues.

To give you some more hints for possible causes: ompi_info could have a
different set of RPATH entries, or different NEEDED libraries than your
test executable; if any of those cause libibverbs.so to be loaded, then
the symbol would be visible already.  Maybe your test executables even
have different RPATHs or NEEDED libs (find out with 'objdump -p' and
ldd)?

>  I've attached 2 tarballs to the bug (you have to go to the URL of the bug
>  to get them; they are not included in the mails):

If there are tarballs available at
http://svn.open-mpi.org/trac/ompi/ticket/334, then I'm too blind to find
them.  Would that be elsewhere?

>   * One with all the configure output and the wrapper script for ompi_info.
>  Note that ompi_info -- which lt_dlopen()'s the OpenIB BTL -- does not show
>  the same problem (i.e., the OpenIB BTL opens properly and ompi_info shows
>  its information).  This happens with both the uninstalled and installed
>  ompi_info.
>   * Another with the same for 2.1a.

That would be very helpful.

Cheers,
Ralf