[OMPI devel] issues with mpirun --prefix syntax

2006-12-08 Thread Patrick Jessee


Hello.  For OpenMPI 1.1.2, I've come across a situation where the 
--prefix syntax does not seem to be working.  I've investigated the 
issue by stepping through the mpirun startup in a debugger.  Below is a 
summary of the problem and details about the investigation (along with a 
prospective fix).


Summary of  problem
===

When starting a openMPI run with the --prefix option, the MPI 
application does not start up correctly in certain situations.   An 
important point is that this problem behavior is masked (and not seen) 
if the openMPI libraries are available at the compile/install-time 
location defined by OPAL_PKGLIBDIR (defined in 
opal/include/opal/install_dirs.h).  So in debugging the problem, it is 
important to move the openMPI installation from the installed location, 
and then set the --prefix value to the new location.   In addition, 
LD_LIBRARY_PATH needs to be set to the new location so mpirun can find 
liborte.so and libopal.so at program load time (--prefix can't help 
mpirun with liborte.so and libopal.so because (a) these libs are 
dynamically linked into mpirun and are needed at program load time, and 
(b) the --prefix arg isn't processed until after load time.  Thus 
LD_LIBRARY_PATH is needed for mpirun, but this is tangential).


The behavior that is see is the following output:

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

orte_sds_base_select failed
--> Returned value -13 instead of ORTE_SUCCESS
:
:
--
Open RTE was unable to initialize properly.  The error occurred while
attempting to orte_init().  Returned value -13 instead of ORTE_SUCCESS.
--


Investigation of the problem
===

As mentioned before, I've looked at mpirun in the debugger.  The 
instance of mpirun (and the MPI app) find the dynamically linked 
libraries (liborte.so, libopal.so) just fine, but they do not locate the 
dynamically loaded ones (the ones in lib/openmpi such as 
mca_paffinity_linux.so, etc.).  The --prefix directory does not seem to 
be getting used to open the libraries in lib/openmpi.


It appears that the location to search is getting set in mca_base_open.c 
around line 68 (1.1.2):


asprintf(, "%s:~/.openmpi/components", OPAL_PKGLIBDIR);
mca_base_param_component_path =
 mca_base_param_reg_string_name("mca", "component_path",
"Path where to look for Open MPI and 
ORTE components",

false, false, value, NULL);


Here, OPAL_PKGLIBDIR is a fixed, compile-time location.  It appears that 
the --prefix directory (actually /lib/openmpi) needs to be 
appended, if not prepended, to the component_path.  Alternatively, the 
static OPAL_PKGLIBDIR directory could just be replaced by the runtime 
value of /lib/openmpi.


I've compiled in a quick fix to libopal.so to see if the approach 
addressed the issue.  I didn't see how to get access to the --prefix 
directory at this point, so I just prepended genenv("LD_LIBRARY_PATH") 
to "value" and added /lib/openmpi to LD_LIBRARY_PATH before 
starting the app (note: this is just a way for verifying that if the 
--prefix directory was used here, it would address the issue; this is 
not a proposed solution.  The /lib/openmpi should be used 
directly).  Anyway, this fixed the issue and the application was able so 
start.


In applying this fix, I also found that is was not only important for 
mca_base_param_component_path to include the /lib/openmpi 
directory in the instances of mpirun and the MPI app, but also in all 
instances of orted before they dynamically load libraries.



In summary, it seems that this issue can be resolved by applying the 
--prefix directory (/lib/openmpi) to 
mca_base_param_component_path in instances of mpirun, orted, and the MPI 
app.


Any help in getting this fix implemented in the code base would be very 
much appreciated, and I'll be happy to provide any more information or 
help.


Regards,

Patrick

P.S.  Even with the fix, a (non-fatal) message is printed.  It's 
probably a tangential issue, but thought it was worth mentioning. Again, 
the --prefix directory probably needs to be used somewhere in place of a 
static directory.  The message is:


--
Sorry!  You were supposed to get help about:
 rds:no-hostfile
from the file:
 help-rds-hostfile.txt
But I couldn't find any file matching that name.  Sorry!

Re: [OMPI devel] clarification regarding optimization of MPI collective calls

2006-12-08 Thread Christian Leber
On Fri, Dec 08, 2006 at 04:11:04PM +0530, krishna chaitanya wrote:

>  I learnt from a reliable source that MPI uses the services
>provided by TCP/IP or infiniband.

Did you hear that from a whistle-blower?

> Suppose that there is a bottle-neck in
> the TCP/IP layer itself, how will optimization of MPI calls really help?

Bottleneck is usually a fuzzy term.

Collective calls basically means for example that you tell MPI:
"send this to all nodes"
instead of
for(i=0;i

Re: [OMPI devel] Major revision to the RML/OOB

2006-12-08 Thread Adrian Knoth
On Thu, Dec 07, 2006 at 11:12:23AM -0500, Jeff Squyres wrote:

Hi,

> > I therefore suggest to move the OPAL changes into the trunk,
> > also the small hostfile code (lex code for IPv6) and the btl code.
> Can you describe the changes in opal that were made for IPv6?

These changes are limited to three files: opal/util/if.[ch] and
the new opal/include/opal/ipv6compat.h. The latter one is only
required for compatibility with old SUSv2 systems.

In if.c, I've added IPv6 interface discovery for Linux and Solaris,
Thomas Peiselt also contributed getifaddrs() support for *BSD/OSX.
Helper functions were extended to deal with struct sockaddr_storage.

I've introduced CIDR netmask handling, so the netmask no longer
holds something like  (a.s.o), but simply 8, 16 or
whatever. There are helper functions to convert from and to CIDR.

/* convert a netmask (in network byte order) to CIDR notation */
static int prefix (uint32_t netmask)

/* convert a CIDR prefixlen to netmask (in network byte order) */
uint32_t opal_prefix2netmask (uint32_t prefixlen)

I've also extended the interface struct, still containing if_index,
but that's just its number in the opal_list. The new field is
called if_kernel_index, representing the associated kernel interface
index for this device. My BTL/TCP code also exchanges this new
information to enable the remote to detect if two or more addresses
are assigned to the same interface, thus preventing oversubscription
(multiple connections to the same interface but to difference addresses,
 which is very likely if you have at least one IPv6 address and one
 IPv4 address on the same interface)

The code in if.c handles both, AF_INET and AF_INET6, so it's no
problem to use it without using IPv6 somewhere else (i.e. oob/tcp,
btl/tcp).

HTH

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Drink wet cement and get really stoned!