On Dec 31, 2007, at 12:50 AM, Jim Kusznir wrote:
I have some questions, though.
As you can probably tell, the multi-package stuff hasn't been tested
in quite a while. Thanks for taking it for a spin. :-)
1) am I correct in that OpenMPI needs to be complied with each
compiler that will be used with it?
Short answer: yes.
Longer answer: if you only care about the C bindings for MPI, then
compiling Open MPI with any compiler should be fine. The need for
multiple compilations/installations largely stems from Fortran and C++
support (because different compilers use different symbol mangling
techniques). We usually advocate using the same compiler to compile
both Open MPI and the end-user application. E.g., if you have an end-
user MPI application that only works with compiler X, then have an
OMPI installation that was built with compiler X as well.
I am currently trying to make rpms using the included .spec file
(contrib/dist/linux/openmpi.spec, IIRC).
2) How do I use it to build against different compilers and end up
with non-colliding namespaces, etc?
Open MPI's configure script takes the standard arguments to override
compilers -- setting environment variables. For example:
./configure CC=icc CXX=icpc ...etc.
It looks like you already noticed that you can pass in arguments to
OMPI's configure script with the "configure_options" default. So you
can pass CC, CXX, etc. via this mechanism, too:
rpmbuild ... --define 'configure_options CC=icc CXX=icpc ...' ...
I am currently concerned with differentiating same version compiled
with different compilers. I origionally changed the name (--define
'_name openmpi-gcc'), but this broke the final phases of rpm building:
RPM build errors:
File not found:
/var/tmp/openmpi-gcc-1.2.4-1-root/opt/openmpi-gcc/1.2.4/share/
openmpi-gcc
I tried changing the version with "gcc" appended, but that also broke,
and as I thought about it more, I thought that would likely induce
headaches later with rpm only letting one version installed, etc.
Little known fact: RPM will allow as multiple installations of a
single package as you want as long as none of the files overlap.
But I agree that differing solely by version number may be a bit
confusing.
You sent me a few more notes about this off-list; I'll take the
liberty of replying on-list so that the discussion is google-able:
The rpm build errored out near the end with a missing file. It was
trying to find /opt/openmpi-gcc/1.2.4/opt/share/openmpi-gcc (IIRC),
but the last part was actually openmpi on disk. I ended up
correcting it by changing line 182 (configuration logic) to:
%define _datadir /opt/%{name}/%{version}/share/%{name}
(I changed _pkgdatadir to _datadir). Your later directive if
_pkgdatadir is undefined took care of _pkgdatadir. I must admit, I
still don't fully understand where rpm was getting the idea to look
for that file...I tried manually configuring _pkgdatadir to the path
that existed, but that changed nothing. If I didn't rename the
package, it all worked fine.
Hmm. This is actually symptomatic of a larger problem -- Open MPI's
configure/build process is apparently not getting the _pkgdatadir
value, probably because there's no way to pass it on the configure
command line (i.e., there's no standard AC --pkgdatadir option).
Instead, the "$datadir/openmpi" location is hard-coded in the Open MPI
code base (in opal/mca/installdirs/config, if you care). As such,
when you re-defined %{_name}, the specfile didn't agree with where
OMPI actually installed the files, resulting in the error you saw.
Yuck.
Well, there are other reasons you can't have multiple OMPI
installations share a single installation tree (e.g., they'll all try
to install their own "mpirun" executable -- per a prior thread, the --
program-prefix/suffix stuff also doesn't work; see https://svn.open-mpi.org/trac/ompi/ticket/1168
for details). So this isn't making OMPI any worse than it already
is. :-\
So I think the best solution for the moment is to just fix the
specfile's %_pkgdatadir to use the hard-coded name "openmpi" instead
of %{name}.
I committed these changes (and some other small fixes for things I
found while testing the _name and multi-package stuff) to the OMPI SVN
trunk in r17036 (see https://svn.open-mpi.org/trac/ompi/changeset/
17036) -- could you give it a whirl and see if it works for you?
And another from an off-list mail:
In the preamble for the separate rpm files, the -devel and -docs
reference openmpi-runtime statically rather than using %{name}-
runtime, which breaks dependencies if you build under a different
name as I am.
Doh. I tried replacing the Requires: with %{_name}-runtime, but then
rpmbuild complained:
error: line 300: Dependency tokens must begin with alpha-numeric, '_'
or '/': Requires: %{_name}-runtime
So it looks like Requires: will only take a hard-coded name, not a
variable (I have no comments in the specfile about this issue, but
perhaps that's why Greg/I hard-coded it in the first place...?).
Yuck. :-(
This error occurred with rpmbuild v4.3.3 (the default on RHEL4U4), so
I tried manually upgrading to v4.4.2.2 from rpm.org to see if this
constraint had been relaxed, but I couldn't [easily] get it to build.
I guess it wouldn't be attractive to use something that would only
work with the newest version RPM, anyway.
We'll unfortunately have to do something different, then. :-
( Obvious but icky solutions include:
- remove the Requires statements
- protect the Requires statements to only be used when %{_name} is
"openmpi"
Got any better ideas?
3) Will the resulting -runtime .rpms (for the different compiler
versions) coexist peacefully without any special environment munging
on the compute nodes, or do I need modules, etc. on all the compute
nodes as well?
They can co-exist peacefully out on the nodes because you should
choose different --prefix values for each installation (e.g., /opt/
openmpi_gcc3.4.0/ or whatever naming convention you choose to use).
That being said, you should ensure that whatever version of OMPI you
use is consistent across an entire job. E.g., if job X was compiled
with the openmpi-gcc installation, then it should use the openmpi-gcc
installation on all the nodes on which it runs.
The easiest way to do that might be to use the --enable-mpirun-prefix-
by-default option to configure. This will cause OMPI to use mpirun's
--prefix option by default (even if you don't specify it on the mpirun
command line), which will effectively tell the remote node where OMPI
lives on the remote nodes (assuming your installation paths are the
same on all nodes -- e.g., /opt/openmpi-gcc). Then you can use
environment modules (or whatever) on your head node / the job's first
node to select which OMPI installation you want, use mpicc/mpiCC/
mpif77/mpif90 to compile your job, and then mpirun will do the Right
thing to select the appropriate OMPI installation on remote nodes,
meaning that it will set the PATH and LD_LIBRARY_PATH on the remote
node for you.
Make sense?
See:
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
http://www.open-mpi.org/faq/?category=running#mpirun-prefix
for a little more detail.
4) I've never really used pgi or intel's compiler. I saw notes in the
rpm about build flag problems and "use your normal optimizations and
flags", etc. As I have no concept of "normal" for these compilers,
are there any guides or examples I should/could use for this?
You'll probably want to check the docs for those compilers.
Generally, GCC-like -O options have similar definitions in these
compilers (they try to be similar to GCC). YMMV.
--
Jeff Squyres
Cisco Systems