Re: [OMPI users] openmpi 1.3.1: bind() failed: Permission denied (13)

2009-04-02 Thread Dirk Eddelbuettel

On 3 April 2009 at 06:35, Jerome BENOIT wrote:
| It appeared that the file /etc/openmpi/openmpi-mca-params.conf on node green 
was the only one
| into the cluster to contain the line
| 
| btl_tcp_port_min_v4 = 49152

Great -- so can we now put your claims of 'the Debian package is broken' to 
rest?

This seems to be a local admin issue as such a line is unlikely to have been
added by either the Debian Open MPI or slurm packages.

Good to know you have it all working,   

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] openmpi 1.3.1: bind() failed: Permission denied (13)

2009-04-02 Thread Dirk Eddelbuettel

On 3 April 2009 at 03:33, Jerome BENOIT wrote:
| The above submission works the same on my clusters.
| But in fact, my issue involve interconnection between the nodes of the 
clusters:
| in the above examples involve no connection between nodes.
| 
| My cluster is a cluster of quadcore computers:
| if in the sbatch script
| 
| #SBATCH --nodes=7
| #SBATCH --ntasks=15
| 
| is replaced by
| 
| #SBATCH --nodes=1
| #SBATCH --ntasks=4
| 
| everything is fine as no interconnection is involved.
| 
| Can you test the inconnection part of the story ?

Again, think about in terms of layers. You have a problem with slurm on top
of Open MPI.  

So before blaming Open MPI, I would try something like this:

~$ orterun -np 2 -H abc,xyz /tmp/jerome_hw
Hello world! I am 1 of 2 and my name is `abc'
Hello world! I am 0 of 2 and my name is `xyz'
~$

ie whether the simple MPI example can be launched successfully on two nodes or 
not.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] openmpi 1.3.1: bind() failed: Permission denied (13)

2009-04-02 Thread Dirk Eddelbuettel

[ It is considered bad form to publically reply to a private message. What I
had sent you earlier was a private mail. ]

On 3 April 2009 at 02:41, Jerome BENOIT wrote:
| 
|  Original Message 
| Subject: Re: [OMPI users] openmpi 1.3.1: bind() failed: Permission denied (13)
| Date: Fri, 03 Apr 2009 02:41:01 +0800
| From: Jerome BENOIT <ml.jgmben...@mailsnare.net>
| Reply-To: ml.jgmben...@mailsnare.net
| To: Dirk Eddelbuettel <e...@debian.org>
| CC: ml.jgmben...@mailsnare.net
| References: <49ce5244.2000...@mailsnare.net>  
<cf5d8e90-17ca-4b60-ae85-2bc2ee318...@cisco.com>
<49d4ef88.6060...@mailsnare.net> <18901.114.820349.347...@ron.nulle.part>
| 
| Hello List,
| 
| so let me precise:
| 
| I submitted on a SLURM box the attached C source phello.c via sbatch with the 
attached script phello.sh
| 
| mpicc -o phello phello.c
| sbatch phello.sh

Works for me (though I prefer salloc), suggesting that you did something to
your network topology or Open MPI configuration:

:~$ cat /tmp/jerome_hw.c
// mpicc -o phello phello.c
// mpirun -np 5 phello

#include 
#include 
#include 

int main(int narg, char *args[]){
  int rank,size;
char ProcessorName[MPI_MAX_PROCESSOR_NAME];
int ProcessorNameLength;

  MPI_Init(,);

  MPI_Comm_rank(MPI_COMM_WORLD,);
  MPI_Comm_size(MPI_COMM_WORLD,);

MPI_Get_processor_name(ProcessorName,);
sleep(11);
fprintf(stdout,
"Hello world! I am %d of %d and my name is `%s'\n",
rank,size,
ProcessorName);

  MPI_Finalize();

  return 0; }

//
// End of file `phello.c'.

:~$ mpicc.openmpi -o /tmp/jerome_hw /tmp/jerome_hw.c
:~$ orterun -np 2 /tmp/jerome_hw
Hello world! I am 1 of 2 and my name is `xyz-1'
Hello world! I am 0 of 2 and my name is `xyz-1'
:~$ salloc orterun -np 2 /tmp/jerome_hw
salloc: Granted job allocation 421
Hello world! I am 0 of 2 and my name is `xyz-1'
Hello world! I am 1 of 2 and my name is `xyz-1'
salloc: Relinquishing job allocation 421
:~$

| I have set no MCA parameter, and the firewalls are off, and the kernels 
(2.6.16) were built with no Security feature.

Try simplifying further: no default hosts beside localhost etc.  Try orterun
before you try salloc.  Simplicity first.

Dirk

-- 
Three out of two people have difficulties with fractions.


[OMPI users] Open MPI 1.3 segfault on amd64 with Rmpi

2009-01-26 Thread Dirk Eddelbuettel

I am chasing a segfault when I use Open MPI (1.3) with Rmpi (0.5.6), the MPI
add-on package for R that is written and maintained btyby Prof Hao Yu (CC'ed)

I should prefix that the code runs just fine on 32bit Debian system at home.
However, on amd64 running Ubuntu 8.10, I am seeing segfaults upon
initialisation.  I use the same R and Open MPI packages on both systems,
suitably recompiled. One of the bigger toolkit difference is the 1.5.26
version of libtool on Debian versus 2.2.4 on Ubuntu.

Gdb doesn't want to step into the Open MPI code; I used debugging symbols for
both R and Open MPI that are available via -dbg packages with the debugging
info.  So descending one function at a time, I see the following calling
sequence

  MPI_Init
  ompi_mpi_init
  orte_init
  opal_init
  opal_paffinity_base_open
  mca_base_components_open
  open_components

where things end in the loop over oapl_list() elements.  I still see a
fprintf() statment just before

   if (MCA_SUCCESS == component->mca_register_component_params()) {

in the middle of the open_components function in the file
mca_base_components_open.c 

Does this make any sense?  I was at first worried that the dynamic loading
faild -- yet this does not seem to be the case as the mpi, open-rte and
open-pal libraries are loaded and I also see code from some of the modules
being executed.  I somehow fear that something is colliding with GNU R, but
despite some familiarity with R I have to admit that I do not know here this
could come from.  

Any pointers would be appreciated.

Regards, Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] how to get openMPI working, someone help me.

2008-11-18 Thread Dirk Eddelbuettel

On 18 November 2008 at 17:06, Venu Gopal wrote:
| Hello,
| 
| I am new into this mailing list, and am trying to install openMPI on Ubuntu
| 8.0.4.1

(That's not an existing version number.)

| Basically my idea is to build a beowulf. Well right now i even dont have
| lots of PC's for this purpose.
| 
| So I am planning to first use virtual machines on VmWare. I have installed
| around four Virtual machines on my PC. And all of them can talk to each
| other. I mean they are all networked together without any firewalls in
| between.
| 
| I downloaded openmpi-1.2.8.tar.gz, extracted it and executed the configure
| script file.
| 
| This gave me lots of errors, and didnt suceed. How do I get this working
| now.

Open MPi is packaged for Debian and hence part of Ubuntu. So just do:

   $ sudo apt-get install libopenmpi1 linopenmpi-dev openmpi-bin openmpi-doc

In Ubuntu 8.10, this gives you Open MPI 1.2.7. In Ubuntu 8.4, you're at a
slightly older version so I suggest upgrading.

In case you really want 1.2.8, by far the easiest way (and also most general)
is to just grab the Debian source from 'Debian unstable' and rebuild on your
system to match your libraries. That can be quasi-automated, see 'apt-get
source' and use google as this is getting off-topic for this list.

Hope this helps, 

Dirk
(one of severap Debian Open MPI maintainers)

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Rmpi installation issues

2008-10-13 Thread Dirk Eddelbuettel
On Mon, Oct 13, 2008 at 03:56:19PM +0200, Simone Giannerini wrote:
> Dear Dirk,
> 
> many thanks for your reply, please see below,

Your mail is basically unreadable / impossible to attribute what is
original and what is cited.  Please follow convention and indent.

> On 13 October 2008 at 11:22, Simone Giannerini wrote:
> | Dear all,
> |
> | I have troubles installing rmpi 0.5-5 (or 0.5-6) on a quad opteron machine
> 
> | with OpenSUSE 11.0 and
> | R 2.7.2
> |
> | platform x86_64-unknown-linux-gnu
> | arch x86_64
> | os linux-gnu
> | system x86_64, linux-gnu
> | status Patched
> | major 2
> | minor 7.2
> | year 2008
> | month 09
> | day 18
> | svn rev 46546
> | language R
> | version.string R version 2.7.2 Patched (2008-09-18 r46546)
> |
> | I tried the following
> |
> | # export MPI_ROOT=/usr/lib64/mpi/gcc/openmpi/
> |
> | # R CMD INSTALL Rmpi_0.5-6.tar.gz
> 
> Where did you get 0.5-6 from? The newest, per the author's website, is
> 0.5-5.
> 
> http://www.stats.uwo.ca/faculty/yu/Rmpi/download/dev

Ok, didn't see that.  But please do understand that 0.5-5 is on CRAN
and released. (And yes, that is the version for which I also had to
make a fix to get it build on Debian as mentioned).

> | * Installing to library '/usr/local/lib64/R/library'
> | * Installing *source* package 'Rmpi' ...
> | checking for gcc... gcc
> | checking for C compiler default output file name... a.out
> | checking whether the C compiler works... yes
> | checking whether we are cross compiling... no
> | checking for suffix of executables...
> | checking for suffix of object files... o
> | checking whether we are using the GNU C compiler... yes
> | checking whether gcc accepts -g... yes
> | checking for gcc option to accept ISO C89... none needed
> | Try to find mpi.h ...
> | Found in /usr/lib64/mpi/gcc/openmpi//include
> | Try to find libmpi.so or libmpich.a
> | checking for main in -lmpi... yes
> 
> At this point you have mpi.h and libmpi. Looks good.
> 
> | checking for openpty in -lutil... yes
> | checking for main in -lpthread... yes
> | configure: creating ./config.status
> | config.status: creating src/Makevars
> | ** libs
> | gcc -std=gnu99 -I/usr/local/lib64/R/include -DPACKAGE_NAME=\"\"
> | -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\"
> | -DPACKAGE_BUGREPORT=\"\" -I/usr/lib64/mpi/gcc/openmpi//include -DMPI2
> | -DOPENMPI -fPIC -I/usr/local/include -fpic -g -O2 -c RegQuery.c -o
> | RegQuery.o
> | gcc -std=gnu99 -I/usr/local/lib64/R/include -DPACKAGE_NAME=\"\"
> | -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\"
> | -DPACKAGE_BUGREPORT=\"\" -I/usr/lib64/mpi/gcc/openmpi//include -DMPI2
> | -DOPENMPI -fPIC -I/usr/local/include -fpic -g -O2 -c Rmpi.c -o Rmpi.o
> | gcc -std=gnu99 -I/usr/local/lib64/R/include -DPACKAGE_NAME=\"\"
> | -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\"
> | -DPACKAGE_BUGREPORT=\"\" -I/usr/lib64/mpi/gcc/openmpi//include -DMPI2
> | -DOPENMPI -fPIC -I/usr/local/include -fpic -g -O2 -c conversion.c -o
> | conversion.o
> | gcc -std=gnu99 -I/usr/local/lib64/R/include -DPACKAGE_NAME=\"\"
> | -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\"
> | -DPACKAGE_BUGREPORT=\"\" -I/usr/lib64/mpi/gcc/openmpi//include -DMPI2
> | -DOPENMPI -fPIC -I/usr/local/include -fpic -g -O2 -c internal.c -o
> | internal.o
> 
> It all compiles, thanks to mpi.h.
> 
> | gcc -std=gnu99 -shared -L/usr/local/lib64 -o Rmpi.so RegQuery.o Rmpi.o
> | conversion.o internal.o -L/usr/lib64/mpi/gcc/openmpi//lib -lmpi -lutil
> | -lpthread -fPIC
> | /usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld:
> 
> | cannot find -lmpi
> | collect2: ld returned 1 exit status
> | make: *** [Rmpi.so] Error 1
> | chmod: cannot access `/usr/local/lib64/R/library/Rmpi/libs/*': No such
> file
> | or directory
> | ERROR: compilation failed for package 'Rmpi'
> | ** Removing '/usr/local/lib64/R/library/Rmpi'
> 
> This seems to indicate that your installation of Open MPI conflicts with
> your
> setting of
> export MPI_ROOT=/usr/lib64/mpi/gcc/openmpi/
> as this directory is expected to contain include/ and lib/
> 
> I am not sure I got this,  Open MPI is installed in
> /usr/lib64/mpi/gcc/openmpi/ and such directory contains both include and lib
> folders:
> 
> gauss:/usr/lib64/mpi/gcc/openmpi # ls -R
> bin  include  lib64  share

Not it does not as lib64 != lib.  You probably it installed if you
create a softlink from

/usr/lib64/mpi/gcc/openmpi/lib64

to  

/usr/lib64/mpi/gcc/openmpi/lib

Dirk

> ./bin:
> mpiCC   mpicc   mpiexec  mpif90  mpivars.csh  ompi_info opalc++
> ortec++  orted
> mpic++  mpicxx  mpif77   mpirun  mpivars.sh   opal_wrapper  opalcc
> ortecc   orterun
> 
> ./include:
> mpi.h  mpif-common.h  mpif-config.h  mpif.h  openmpi
> 
> ./include/openmpi:
> ompi   opal   opal_config_bottom.h  orte
> ompi_config.h  opal_config.h  opal_stdint.h orte_config.h
> 
> [...]
> 
> ./lib64:
> libmca_common_sm.la  

Re: [OMPI users] Rmpi installation issues

2008-10-13 Thread Dirk Eddelbuettel

On 13 October 2008 at 11:22, Simone Giannerini wrote:
| Dear all,
| 
| I have troubles installing rmpi  0.5-5 (or 0.5-6) on a quad opteron machine
| with OpenSUSE 11.0 and
| R 2.7.2
| 
| platform   x86_64-unknown-linux-gnu
| arch   x86_64
| os linux-gnu
| system x86_64, linux-gnu
| status Patched
| major  2
| minor  7.2
| year   2008
| month  09
| day18
| svn rev46546
| language   R
| version.string R version 2.7.2 Patched (2008-09-18 r46546)
| 
| I tried the following
| 
| # export MPI_ROOT=/usr/lib64/mpi/gcc/openmpi/
| 
| # R CMD INSTALL  Rmpi_0.5-6.tar.gz

Where did you get 0.5-6 from? The newest, per the author's website, is 0.5-5.

| * Installing to library '/usr/local/lib64/R/library'
| * Installing *source* package 'Rmpi' ...
| checking for gcc... gcc
| checking for C compiler default output file name... a.out
| checking whether the C compiler works... yes
| checking whether we are cross compiling... no
| checking for suffix of executables...
| checking for suffix of object files... o
| checking whether we are using the GNU C compiler... yes
| checking whether gcc accepts -g... yes
| checking for gcc option to accept ISO C89... none needed
| Try to find mpi.h ...
| Found in /usr/lib64/mpi/gcc/openmpi//include
| Try to find libmpi.so or libmpich.a
| checking for main in -lmpi... yes

At this point you have mpi.h and libmpi. Looks good.

| checking for openpty in -lutil... yes
| checking for main in -lpthread... yes
| configure: creating ./config.status
| config.status: creating src/Makevars
| ** libs
| gcc -std=gnu99 -I/usr/local/lib64/R/include -DPACKAGE_NAME=\"\"
| -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\"
| -DPACKAGE_BUGREPORT=\"\" -I/usr/lib64/mpi/gcc/openmpi//include -DMPI2
| -DOPENMPI -fPIC -I/usr/local/include-fpic  -g -O2 -c RegQuery.c -o
| RegQuery.o
| gcc -std=gnu99 -I/usr/local/lib64/R/include -DPACKAGE_NAME=\"\"
| -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\"
| -DPACKAGE_BUGREPORT=\"\" -I/usr/lib64/mpi/gcc/openmpi//include -DMPI2
| -DOPENMPI -fPIC -I/usr/local/include-fpic  -g -O2 -c Rmpi.c -o Rmpi.o
| gcc -std=gnu99 -I/usr/local/lib64/R/include -DPACKAGE_NAME=\"\"
| -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\"
| -DPACKAGE_BUGREPORT=\"\" -I/usr/lib64/mpi/gcc/openmpi//include -DMPI2
| -DOPENMPI -fPIC -I/usr/local/include-fpic  -g -O2 -c conversion.c -o
| conversion.o
| gcc -std=gnu99 -I/usr/local/lib64/R/include -DPACKAGE_NAME=\"\"
| -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\"
| -DPACKAGE_BUGREPORT=\"\" -I/usr/lib64/mpi/gcc/openmpi//include -DMPI2
| -DOPENMPI -fPIC -I/usr/local/include-fpic  -g -O2 -c internal.c -o
| internal.o

It all compiles, thanks to mpi.h.

| gcc -std=gnu99 -shared -L/usr/local/lib64 -o Rmpi.so RegQuery.o Rmpi.o
| conversion.o internal.o -L/usr/lib64/mpi/gcc/openmpi//lib -lmpi -lutil
| -lpthread -fPIC
| /usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld:
| cannot find -lmpi
| collect2: ld returned 1 exit status
| make: *** [Rmpi.so] Error 1
| chmod: cannot access `/usr/local/lib64/R/library/Rmpi/libs/*': No such file
| or directory
| ERROR: compilation failed for package 'Rmpi'
| ** Removing '/usr/local/lib64/R/library/Rmpi'

This seems to indicate that your installation of Open MPI conflicts with your
setting of 
export MPI_ROOT=/usr/lib64/mpi/gcc/openmpi/
as this directory is expected to contain include/ and lib/

| I also tried with
| 
| R CMD INSTALL  Rmpi_0.5-6.tar.gz
| --configure-args=--with-mpi=/usr/lib64/mpi/gcc/openmpi/
| 
| with the same results.
| Any help would be greatly appreciated.

I'd recommend having a look at configure.ac, which is pretty straightforward,
and 'helping' it with the locations you have. I had to do the same for
Debian's Rmpi due to use also having mpich and lam.

Hth,  Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Warnings in Ubuntu Hardy

2008-09-06 Thread Dirk Eddelbuettel

On 6 September 2008 at 22:13, Davi Vercillo C. Garcia () wrote:
| I'm trying to execute some programs in my notebook (Ubuntu 8.04) using
| OpenMPI, and I always get a warning message like:
| 
| libibverbs: Fatal: couldn't read uverbs ABI version.
| --
| [0,0,0]: OpenIB on host juliana was unable to find any HCAs.
| Another transport will be used instead, although this may result in
| lower performance.
| --
| 
| What is this ?!

Uncomment this in /etc/openmpi/openmpi-mca-params.conf:

  # Disable the use of InfiniBand
  btl = ^openib

which is the default in newer packages.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] cluster LiveCD

2008-08-05 Thread Dirk Eddelbuettel

On 5 August 2008 at 17:01, Ben Payne wrote:
| Hello.  I am not sure if this is the correct list to ask this
| question, so if you know of a more appropriate one please let me know.
| 
| I think am looking for a LiveCD that supports MPI, specifically one
| that has mpif90 built in, and can easily mount external (USB) drives
| for storing data.
| 
| I have access to 40 Windows computers in a lab that rarely gets used.
| I would like to use the computers to run a cluster during the
| weekends, but be able to not mess with the Windows installation that
| exists on the hard drive. Because of this, I think a LiveCD would be
| good, and one that supports PXE booting is even better.  If there is a
| better way to do this (run MPI, not disrupt Windows) please let me
| know.
| 
| The applications that I want to run are originally written in
| Fortran90 and have been ported to MPI (by me) and compile with mpif90.
|  I have attempted to use ParallelKnoppix and PelicanHPC (see below)
| and have spoken to the author of the distro, but he isn't explicitly
| supporting mpif90.
| 
| >From the list at
| http://www.knoppix.net/wiki/Cluster_Live_CD
| I have tried many distros.  Below are the results of my attempts with
| the LiveCDs:
| 
| BCCD 2.2.1c7 [DHCP server, ssh "heartbeats"]:
| http://bccd.cs.uni.edu/
| mpiexec "command not found"
| mpirun -np 1 ./a.out   : "cannot execute binary file"
| mpif90: "no fortran 90 compiler specified when mpif90 was created"
| gfortran: "command not found"
| 
| PelicanHPC (Debian)   [DHCP server, PXE boot]:
| http://pareto.uab.es/mcreel/PelicanHPC/
| mpirun works
| mpiexec works
| gfortran works
| mpif90: "command not found"
| 
| ParallelKnoppix 2.9 (Knoppix) [DHCP server, PXE boot]:
| http://idea.uab.es/mcreel/ParallelKnoppix/
| mpirun -np 1 ./a.out : "cannot execute binary file
| mpiexec -np 1 ./a.out : "cannot execute binary file", "mpirun
| failed with exit status 252"
| gfortran works
| mpif90: "command not found"
| ifort: "command not found"
| lamexec -np 1 ./a.out: "cannot execute binary file
| Note: mounting external drives is most intuitive in PK
| 
| ClusterKnoppix: OpenMOSIX (no MPI)
| http://clusterknoppix.sw.be/
| 
| CHAOS: OpenMOSIX (no MPI)
| 
| Pai Pix: could not find on internet

You forgot Quantian, a large extension of clusterKnoppix and has OpenMOSIX as
well as MPI and PVM -- but I haven't updated it in a long-ish while. I think
I include LAM and MPICH, but I doubt it had f90.

However, it so happens that I am currently updating it (in a smaller format)
for a tutorial I am giving next week [1] and which covers MPI.  So the live
cdrom (based on current Debian testing, which will soon be the next release)
will give you a working KDE environment, emacs, and current Open MPI from
Debian which includes mpif77 and mpif90, plus a bunch of R goodies you may
not care too much about.  Interested? Ping me off-list.

Dirk

[1] http://www.statistik.uni-dortmund.de/useR-2008//tutorials/eddelbuettel.html

| Thanks for your help,
| 
| Ben
| ___
| users mailing list
| us...@open-mpi.org
| http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] MPI_Init segfault on Ubuntu 8.04 version 1.2.7~rc2

2008-07-28 Thread Dirk Eddelbuettel

On 24 July 2008 at 14:39, Adam C Powell IV wrote:
| Greetings,
| 
| I'm seeing a segfault in a code on Ubuntu 8.04 with gcc 4.2.  I
| recompiled the Debian lenny openmpi 1.2.7~rc2 package on Ubuntu, and
| compiled the Debian lenny petsc and libmesh packages against that.
| 
| Everything works just fine in Debian lenny (gcc 4.3), but in Ubuntu
| hardy it fails during MPI_Init:
| 
| [Thread debugging using libthread_db enabled]
| [New Thread 0x7faceea6f6f0 (LWP 5376)]
| 
| Program received signal SIGSEGV, Segmentation fault.
| [Switching to Thread 0x7faceea6f6f0 (LWP 5376)]
| 0x7faceb265b8b in _int_malloc () from /usr/lib/libopen-pal.so.0
| (gdb) backtrace
| #0  0x7faceb265b8b in _int_malloc () from /usr/lib/libopen-pal.so.0
| #1  0x7faceb266e58 in malloc () from /usr/lib/libopen-pal.so.0
| #2  0x7faceb248bfb in opal_class_initialize ()
|from /usr/lib/libopen-pal.so.0
| #3  0x7faceb25ce2b in opal_malloc_init () from /usr/lib/libopen-pal.so.0
| #4  0x7faceb249d97 in opal_init_util () from /usr/lib/libopen-pal.so.0
| #5  0x7faceb249e76 in opal_init () from /usr/lib/libopen-pal.so.0
| #6  0x7faced05a723 in ompi_mpi_init () from /usr/lib/libmpi.so.0
| #7  0x7faced07c106 in PMPI_Init () from /usr/lib/libmpi.so.0
| #8  0x7facee144d92 in libMesh::init () from /usr/lib/libmesh.so.0.6.2
| #9  0x00411f61 in main ()
| 
| libMesh::init() just has an assertion and command line check before
| MPI_Init, so I think it's safe to conclude this is an OpenMPI problem.
| 
| How can I help to test and fix this?
| 
| This might be related to Vincent Rotival's problem in
| http://www.open-mpi.org/community/lists/users/2008/04/5427.php or maybe
| http://www.open-mpi.org/community/lists/users/2008/05/5668.php .  On the
| latter, I'm building the Debian package, which should have the
| LDFLAGS="" fix.  Hmm, nope, no LDFLAGS anywhere in the .diff.gz...  The
| OpenMPI top-level Makefile has
| "LDFLAGS = -export-dynamic -Wl,-Bsymbolic-functions"

What bit us in the second bug report you refer to there was that _Ubuntu_ set
this LDFLAGS value in their binutils settings for hardy.  We do (did?) not
(or at least not yet) do that in Debian -- the binutils there do not add
LDFLAGS which is why do not unset anything in the debian/rules for ompi.

As I recall, updated packages for Ubuntu hardy have been fix, i.e. have been
built without the bad LDFLAGS value.

Hope this helps,  Dirk

| 
| -Adam
| -- 
| GPG fingerprint: D54D 1AEE B11C CE9B A02B  C5DD 526F 01E8 564E E4B6
| 
| Engineering consulting with open source tools
| http://www.opennovation.com/
| ___
| users mailing list
| us...@open-mpi.org
| http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Crash in _int_malloc via MPI_Init

2008-06-16 Thread Dirk Eddelbuettel

On 15 June 2008 at 17:11, Brian Barrett wrote:
| On Jun 15, 2008, at 2:20 PM, Dirk Eddelbuettel wrote:
| 
| > Yup: I still suspect compiler / linker changes in Ubuntu between Gutsy
| > (released Oct 2007) and Hardy (April 2008).
| >
| > Why? Because the exactly same source package for Open MPI (as  
| > maintained by
| > Manuel and myself for Debian) works for me on Ubuntu Hardy __if I  
| > compile it
| > on Ubuntu Gutsy__.
| >
| > Now, I reported this to Ubuntu ... for no answer.  Lucas and  
| > Christoph at
| > Debian today released a feature allowing us Debian maintainers to  
| > see which
| > our packages have bugreports in Ubuntu.  It was only through this  
| > mechanism
| > that I learned that the segfault I saw with Rmpi (using Open MPI)  
| > had been
| > experienced by someone else, and that a similar bug occurs with  
| > Python use on
| > top of Open MPI.
| >
| > But still no tangible answer from Canonical / Ubuntu other that some
| > reshuffling of bug reports titles and numbers.  Very disappointing.
| >
| > I am CCing Steffen and Andreas who've seen similar bugs and are  
| > awaiting
| > answers too.  I am also CCing Cesare at Ubuntu who did the bug  
| > rearrangement,
| > maybe he will find a moment to share their plans with us.
| 
| I suppose I'm glad that it doesn't look like an Open MPI problem.  Due  

Yup. Just heard from the fellow at Ubuntu/Canonical: they broke things via 
LDFLAGS="-Wl,-Bsymbolic" which makes Open MPI fall on its face due to the
three distinct libraries...  Setting LDFLAGS="" as we do for Debian overcomes
the problem.

Cheers, Dirk


| to continual problems with the ptmalloc2 code in Open MPI, we've  
| decided that for v1.3, we'll extract that code out into its own  
| library.  Users who need the malloc hooks for InifiniBand support  
| (only a small number of applications really benefit from it) will have  
| to explicitly link in the extra library.  Hopefully, this will resolve  
| some of these headaches.

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Crash in _int_malloc via MPI_Init

2008-06-15 Thread Dirk Eddelbuettel

On 15 June 2008 at 15:53, Andreas Klöckner wrote:
| On Mittwoch 14 Mai 2008, Andreas Klöckner wrote:
| > Hi there,
| >
| > I would like to put this crash bug [1] that Sam Adams pointed out back on
| > the radar--I ran into this, and there's also an Ubuntu bug [2] (which also
| > contains my stack trace).
| >
| > Anybody have an idea what could cause this?
| >
| > Thanks,
| > Andreas
| >
| > [1] http://www.open-mpi.org/community/lists/users/2007/08/3844.php
| > [2] https://bugs.launchpad.net/bugs/210273
| 
| Dirk Eddelbuettel has pinpointed this to (likely) be a binutils issue (in 
| Ubuntu, among others, not in Debian)
| 
| See https://bugs.launchpad.net/ubuntu/+source/binutils/+bug/234837

Yup: I still suspect compiler / linker changes in Ubuntu between Gutsy
(released Oct 2007) and Hardy (April 2008).

Why? Because the exactly same source package for Open MPI (as maintained by
Manuel and myself for Debian) works for me on Ubuntu Hardy __if I compile it
on Ubuntu Gutsy__.

Now, I reported this to Ubuntu ... for no answer.  Lucas and Christoph at
Debian today released a feature allowing us Debian maintainers to see which
our packages have bugreports in Ubuntu.  It was only through this mechanism
that I learned that the segfault I saw with Rmpi (using Open MPI) had been
experienced by someone else, and that a similar bug occurs with Python use on
top of Open MPI.

But still no tangible answer from Canonical / Ubuntu other that some
reshuffling of bug reports titles and numbers.  Very disappointing.

I am CCing Steffen and Andreas who've seen similar bugs and are awaiting
answers too.  I am also CCing Cesare at Ubuntu who did the bug rearrangement,
maybe he will find a moment to share their plans with us.

Tschoe,  Dirk


| Andreas
| 
| ___
| users mailing list
| us...@open-mpi.org
| http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
Three out of two people have difficulties with fractions.



Re: [OMPI users] Fwd: mpicc does not link against Fortran libraries? Withdrawn

2008-04-25 Thread Dirk Eddelbuettel

On 25 April 2008 at 20:10, Barry Smith wrote:
| 
| 
| A smarter colleague then I, has reminded me that is very difficult  
| to obtain all the
| Fortran libraries and linker options that would be needed to allow the  
| mpicc compiler to
| also link against the MPI fortran libraries successfully. I therefor  
| withdraw my
| original question?

Well yes, which is why Open MPI gives you the _Fortran_ wrappers

mpif77
mpif90

in addition to mpicc and mpic++ --- did you try those?  And strictly
speaking, there is no 'mpicc compiler' but a bunch of libraries etc that are
used along with the standard GNU Compiler Collection (aka gcc, g++, gfortran
et al).

Hope this helps, and greetings to Argonne from 15+ miles northeast, 

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] openMPI + Ubuntu 7.10 puzzling

2008-04-22 Thread Dirk Eddelbuettel

On 22 April 2008 at 00:12, Vincent Rotival wrote:
| Sorry to bother you all about that but I am quite lost with a puzzling 
| problem concerning openMPI + Ubuntu 7.10. I could not find similar 
| threads on the archive
[...]
| I am using openMPI version 1.2.6 (but same bug occured with 1.2.5), 
| which I compile directly from the source from www.openmpi.org 
| , with F90=ifort with Intel Fortran 10.1

I can't speak to ifort, but as one of the Open MPI maintainers for Debian, I
can assure you that the 1.2.* series works just fine on Ubuntu ... if you
rebuild from current Debian sources. 

I have forgotten what version of Open MPI made the 'freeze' for Ubuntu 7.10,
but it is probably something older if not even the 1.1.*.  I typically just
point my apt inputs to Debian unstable source (not binary packages) and then
fetch what I want to rebuild via 'apt-get source libopenmpi-dev'. That
requires some minimal Debian packaging skills you could learn from a number
of sources on the web,

Now, if and when you rebuild from source, I would make sure that you do not
have 'native' Ubuntu Open MPI and LAM '-dev' packages installed to avoid the
header / library mismatch you seem to be experiencing.

| I have not updated ifort since about 6 months, the only change between 
| last time I used MPI are small Ubuntu updates. I can give you much 
| more complex codes which worked perfectly one week ago

"small" Ubuntu updates?  When I roll my work machines to new Ubuntu releases,
it upgrades _hundreds_ of packages. Not what I call small.

Hope this helps, Dirk


-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Fwd: R npRmpi

2007-12-18 Thread Dirk Eddelbuettel

On 18 December 2007 at 16:08, Randy Heiland wrote:
| The pkg in question is here:  http://www.stats.uwo.ca/faculty/yu/Rmpi/
| 
| The question is:  has anyone on this list got OpenMPI working for  
| this pkg?  Any suggestions?

Yes -- I happen to maintain GNU R, a number of R packages (eg r-cran-*) and
more for Debian and am also part of Debian's Open MPI maintainer group. I
also use Rmpi at work.

Dr Yu and I sorted out all relevant issues a few weeks ago and the most
current Rmpi (ie 0.5-5) works out of the box on Debian and Ubuntu, and is
current in Debian.  It should "just work" on any other recent Linux and Unix
distro.  If not please report back what configure reports and where it fails.

[ As an aside, we do have a current bug in Debian unstable with Open MPI as
we're trying to make transition between LAM, MPICH and Open MPI more
bullet-proof. If you use just Open MPI you should already be fine. ]

Greeting from Chicago,  Dirk 

| 
| thanks, Randy
| 
| 
| Begin forwarded message:
| 
| >
| > Subject: R npRmpi
| >
| > Been looking into the npRmpi problem
| >
| > I can get a segfault executing
| >> mpi.spawn.Rslaves()
| >
| > The documentation .html files under npRmpi contains the following:
| >
| > "mpi.spawn.Rslaves to spawn R slaves on selected hosts. This is
| > a LAM-MPI specific function."
| >
| >> lamhosts()
| > sh: lamnodes: command not found
| >
| > The documentation for nearly all mpi.xxx.xxx calls send you to
| > www.lam-mpi.org for more information.
| >
| > Looks for all the world this package depends on LAM-MPI which
| > is not installed on Quarry. I don't think pointing the build
| > at an OpenMPI install will help. The .c sources will compile
| > just fine but when R goes to use them it refers to LAM-MPI
| > dependent functions and behaves  badly.
| >
| 
| The pkg in question is here:  http://www.stats.uwo.ca/faculty/yu/Rmpi/;>http://www.stats.uwo.ca/faculty/yu/Rmpi/The question is:  has anyone on 
this list got OpenMPI working for this pkg?  Any suggestions?thanks, Randy Begin forwarded 
message:Subject: R npRmpi Been looking into the npRmpi problemI can get a segfault 
executing mpi.spawn.Rslaves() The documentation .html files 
under npRmpi contains the following:"mpi.spawn.Rslaves to spawn R slaves on selected hosts. 
This isa LAM-MPI specific function." lamhosts() sh: lamnodes: command not 
foundThe documentation 
for nearly all mpi.xxx.xxx calls send you towww.lam-mpi.org for 
more information.Looks for all the world this package depends on LAM-MPI whichis not installed on Quarry. I don't think pointing the buildat an OpenMPI install 
will help. The .c sources will compilejust fine but when R 
goes to use them it refers to LAM-MPIdependent functions 
and behaves  badly. 
___
| users mailing list
| us...@open-mpi.org
| http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
Three out of two people have difficulties with fractions.



Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-13 Thread Dirk Eddelbuettel

On 13 December 2007 at 13:17, Lisandro Dalcin wrote:
| Perhaps I was not clear enough. There are many public ways of
| importing modules in Python. Modules can came mainly from two sources:
| pure Python code, or C compiled code. In the later case (called
| extension modules), they are normally a shared object
| (.so,.dylib,.dll) exporting only a simbol: 'void
| init(void)', this functions bootstraps the initialization
| of the extension module. What is somewhat hidden is the actual way
| Python loads this shared object and calls the init function. I believe
| the reason for this is related to de gory details of dlopen()ing in
| different OS's/archs combination.
| 
| Anyway, Python enables you to temporarily change the flags to be used
| in dlopen() calls, what is not (currently) so easy is to get the value
| of RTLD_GLOBAL in a portable way.
| 
| Jeff, in short: I believe I solved (with the help of Brian) the issue
| in the context of Python and the platforms we have access to. So, from
| our side, things are working.
| 
| However, I believe this issue is going to cause trouble to any other
| guy trying  to load shared objects using MPI. This makes me think that
| Open MPI should be in charge or solving this, but I understand the
| situation is complex and all us are usually out of time. I'll try to
| re-read all the stuff to better understand the beast.

Just to recap: when we tried to address the same issue for the 'Rmpi' package
for GNU R, it was actually the hint in FAQ for Open MPI itself (!!) that lead
Hao (ie Rmpi's author) to the use of the RTLD_GLOBAL flag.  So what Lisandro
is asking for is already (at least somewhat) addressed and documented at the
Open MPI site.

Anyway, great to hear that things work for Python too. It's always good to
have more tools.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Is anyone researching PGAPack

2007-11-23 Thread Dirk Eddelbuettel

On 22 November 2007 at 17:08, yh sun wrote:
| hi ,
|Recently i am practising with PGAPack. Have  anyone do the similiar work?

I have using PGApack with Open MPI.  I have also prepared a draft of what
should "soon" become a new / updated PGApack releases, primarily with an
updated / newer license.  

| and by the way who knows the e-mail of David Levine who is the developer of
| PGAPack. thanks very much`

I do have David's email via his former mentors and colleagues at Argonne --
but David 'retired' from PGApack many, many years ago.  

Hope this helps,  Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Cannot suppress openib error message

2007-10-25 Thread Dirk Eddelbuettel

On 25 October 2007 at 07:54, Jeff Squyres wrote:
| We will not dlopen libibverbs.so directly -- we will only dlopen the  
| mca_btl_openib.so file.  The dynamic linker will automatically open  
| all of its dependencies.  If those dependencies cannot be found /  
| symbols cannot be resolved, the dynamic linker will fail the dlopen  
| of libibverbs.
| 
| Can you run "ldd mca_btl_openib.so" on your head node and your  
| compute nodes?  See if there's a difference in the output.  I think  
| this is the next step in this troubleshooting process...

Sure, good idea.

head and build machine:

$ ldd /usr/lib/openmpi/mca_btl_openib.so
linux-gate.so.1 =>  (0xe000)
libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0xb7f42000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb7f2b000)
libmpi.so.0 => /usr/lib/libmpi.so.0 (0xb7ea6000)
libopen-rte.so.0 => /usr/lib/libopen-rte.so.0 (0xb7e52000)
libopen-pal.so.0 => /usr/lib/libopen-pal.so.0 (0xb7dfb000)
libdl.so.2 => /lib/libdl.so.2 (0xb7df7000)
libnsl.so.1 => /lib/libnsl.so.1 (0xb7de1000)
libutil.so.1 => /lib/libutil.so.1 (0xb7ddd000)
libm.so.6 => /lib/libm.so.6 (0xb7db7000)
libc.so.6 => /lib/libc.so.6 (0xb7c8a000)
/lib/ld-linux.so.2 (0x8000)

compute node:
$ ldd /usr/lib/openmpi/mca_btl_openib.so
/usr/lib/openmpi/mca_btl_openib.so: /usr/lib/libibverbs.so.1: version 
`IBVERBS_1.1' not found (required by /usr/lib/openmpi/mca_btl_openib.so)
linux-gate.so.1 =>  (0xe000)
libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0xb7ee6000)
libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xb7ecf000)
libmpi.so.0 => /usr/lib/libmpi.so.0 (0xb7e4a000)
libopen-rte.so.0 => /usr/lib/libopen-rte.so.0 (0xb7df6000)
libopen-pal.so.0 => /usr/lib/libopen-pal.so.0 (0xb7d9f000)
libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7d9b000)
libnsl.so.1 => /lib/tls/i686/cmov/libnsl.so.1 (0xb7d84000)
libutil.so.1 => /lib/tls/i686/cmov/libutil.so.1 (0xb7d8)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7d58000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7c17000)
libsysfs.so.2 => /lib/libsysfs.so.2 (0xb7c0c000)
/lib/ld-linux.so.2 (0x8000)

Bingo!!  And I am being found with my package install being inconsistent. Tst 
tst.
I *think* this may be due to the fact that at one point before "we" (as in
the few folks looking after the .deb for Open MPI) had learned about the 'btl
^openib' option and I had become so disenchanted with the 'noisy' message
that I hacked libibverbs.  That may explain the head-node.  Let me get that
one back to the pristine Ubuntu / Debian package, and then to possibly
rebuild the Open MPI package there to correct depends going.

Thanks so much for your help and patience on this.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Cannot suppress openib error message

2007-10-24 Thread Dirk Eddelbuettel

On 24 October 2007 at 21:31, Jeff Squyres wrote:
| On Oct 24, 2007, at 9:23 PM, Dirk Eddelbuettel wrote:
| 
| > | If I had to guess, the systems where you don't see the warning are
| > | systems that have OFED loaded.
| >
| > I am pretty sure that none of the systems (at work) have IB  
| > hardware.  I am
| > very sure that my home systems do not, and there the 'btl = ^openib'
| > successfully suppresses the warning --- whereas at work it doesn't.
| 
| Note that you don't need to have IB hardware -- all you need is the  
| OFED software loaded.  I don't know if Debian ships the OFED  
| libraries by default...?  In particular, look for libibverbs:
| 
| [18:28] svbu-mpi:~/svn/ompi % ldd $bogus/lib/openmpi/mca_btl_openib.so
|  libibverbs.so.1 => /usr/lib64/libibverbs.so.1  
| (0x002a956c2000)
|  libnsl.so.1 => /lib64/libnsl.so.1 (0x002a957cd000)
|  libutil.so.1 => /lib64/libutil.so.1 (0x002a958e4000)
|  libm.so.6 => /lib64/tls/libm.so.6 (0x002a959e8000)
|  libpthread.so.0 => /lib64/tls/libpthread.so.0  
| (0x002a95b6e000)
|  libc.so.6 => /lib64/tls/libc.so.6 (0x002a95c83000)
|  libdl.so.2 => /lib64/libdl.so.2 (0x002a95eb8000)
|  /lib64/ld-linux-x86-64.so.2 (0x00552000)

Good point.  However, I use the .deb packages which are I build for Debian,
and they use libibverbs where available:

Build-Depends: [...], libibverbs-dev [!kfreebsd-i386 !kfreebsd-amd64 \
!hurd-i386], gfortran, libsysfs-dev, automake, gcc (>= 4:4.1.2)

in particular on i386. Consequently, the binary package ends up with a
Depends on the run-time package 'libibverbs1' -- and this will hence always
be present as all my systems use the .deb packages (either from Debian or
locally rebuild) that forces libibverbs1 in via this Depends.

At work, I re-build these same package under Ubuntu on my "head node".  And
on the head node, no warning is seen -- wherease my computes issue the
warning.

Could this be another one of the dlopen issues where basically
ldopen("libibverbs.so") 
is executed?   Because the compute nodes do NOT have libibverbs.so (from the
-dev package) but only libibverbs.so.1.0.0 and its matching symlink
libibverbs.so.1.

I just tested that hypothesis and install libibverbs-dev, but no beans. Still
get the warning. 

| However, I note something in your last reply that I may have missed  
| before -- can you clarify a point for me: are you saying that on your  
| home machine, this generates the openib "file not found" warning:
| 
|  mpirun -np 2 hello
| 
| but this does not:
| 
|  mpirun -np 2 --mca btl ^openib hello

More or less, but I use /etc/openmpi/openmci-mca-params.conf to toggle
^openib.  Adding it again as --mca btl ^openib changes nothing, unfortunately.

| If so, can you confirm which version of Open MPI you are running?   
| The only reason that I can think that that would happen is if you are  
| running a trunk nightly download of Open MPI...  If not, then there's  
| something else going on that would be worth understanding.

No, plain 1.2.4 from the original tarballs.

Still puzzled.  To recap, the head node and the compute node all use the same
Ubuntu release, use the same binary .deb packages from Open MPI 1.2.4 I
rebuild there.  The 'sole' difference is that the 'head node' has more
development packages and tools installed -- but that should not matter.  I
just re-checked and the compute node does not have any LAM or MPICH
parts remaining.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Cannot suppress openib error message

2007-10-24 Thread Dirk Eddelbuettel

On 24 October 2007 at 16:22, Jeff Squyres wrote:
| On Oct 24, 2007, at 4:16 PM, Dirk Eddelbuettel wrote:
| 
| > I buy that explanation any day, but what is funny is that the
| > btl = ^openib
| > does suppress the warning on some of my systems (all running 1.2.4)  
| > but not
| > others (also running 1.2.4).
| 
| If I had to guess, the systems where you don't see the warning are  
| systems that have OFED loaded.

I am pretty sure that none of the systems (at work) have IB hardware.  I am
very sure that my home systems do not, and there the 'btl = ^openib'
successfully suppresses the warning --- whereas at work it doesn't.

Must be a side-effect from something else. I made sure not lam libs were
left around.  

Dirk


-- 
Three out of two people have difficulties with fractions.


[OMPI users] Cannot suppress openib error message

2007-10-24 Thread Dirk Eddelbuettel

I've been scratching my head over this:

lnx01:/usr/lib> orterun -n 2  --mca btl ^openib  ~/c++/tests/mpitest
[lnx01:14417] mca: base: component_find: unable to open btl openib: file not 
found (ignored)
[lnx01:14418] mca: base: component_find: unable to open btl openib: file not 
found (ignored)
Hello world, I'm process 0
Hello world, I'm process 1
lnx01:/usr/lib> grep openib /etc/openmpi/openmpi-mca-params.conf
#   btl = ^openib
btl = ^openib
lnx01:/usr/lib> orterun -n 2   ~/c++/tests/mpitest
[lnx01:14429] mca: base: component_find: unable to open btl openib: file not 
found (ignored)
[lnx01:14430] mca: base: component_find: unable to open btl openib: file not 
found (ignored)
Hello world, I'm process 0
Hello world, I'm process 1

and when I strace it, I get

uname({sys="Linux", node="lnx01", ...}) = 0
open("/etc/openmpi/openmpi-mca-params.conf", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf820698) = -1 ENOTTY (Inappropriate 
ioctl for device)
fstat64(3, {st_mode=S_IFREG|0644, st_size=2877, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7f72000
read(3, "#\n# Copyright (c) 2004-2005 The "..., 8192) = 2877
read(3, "", 4096)   = 0
read(3, "", 8192)   = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf8205f8) = -1 ENOTTY (Inappropriate 
ioctl for device)
close(3)= 0
munmap(0xb7f72000, 4096)= 0

Why can't I suppress the dreaded Infinityband message?

System is Ubuntu 7.04 with 'ported' (ie locally recompiled) current Open MPI 
packages
from Debian. 

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-10-12 Thread Dirk Eddelbuettel

Brian,

On 12 October 2007 at 13:55, Brian Granger wrote:
| > My guess is that Rmpi is dynamically loading libmpi.so, but not
| > specifying the RTLD_GLOBAL flag.  This means that libmpi.so is not
| > available to the components the way it should be, and all goes
| > downhill from there.  It only mostly works because we do something
| > silly with how we link most of our components, and Linux is just
| > smart enough to cover our rears (thankfully).
| 
| In mpi4py, libmpi.so is linked in at compile time, not loaded using
| dlopen.  Granted, the resulting mpi4py binary is loaded into python
| using dlopen.

AFAIK that is the same for all dynamically loaded extensions for all 'host
systems' I've used (ie Perl, Python, Octave, R, ...), and of course also Rmpi
for R.

But I looked some more at mpi4py and the mpipython front-ends for Python. As
I recall, it mentions explicitly that argc/argv need to go to MPI_INIT before
Python does its thing with them.  So other caveats may apply -- you may have
to tweak the engine proper.  For us, GNU R is unaltered and the suggested
change is local the the Rmpi extension package.

| >- Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
| >  flag and fix the problem properly.
| 
| Again, my main problem with this solution is that it means I must both
| link to libmpi at compile time and load it dynamically using dlopen.
| This doesn't seem right.  Also, it makes it impossible on OS X to

IIRC this is about the RTLD_GLOBAL flag.  The default load doesn't use it,
not all symbols are exported and we get the warning.  The code still works,
mind you.  

Using the explicit dlopen with the correct flag, the (spurious) warning
disappears.  That's what we wanted.

| avoid setting LD_LIBRARY_PATH (OS X doesn't have rpath).  Being able
| to use openmpi without setting LD_LIBRARY_PATH is important.

We do net set LD_LIBRARY_PATH. The libmpi.so library still sits where it
always has, in /usr/lib, where ld.so looks anyway.  So this doesn't apply.

Regards, Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-10-10 Thread Dirk Eddelbuettel

On 10 October 2007 at 15:27, Brian Granger wrote:
| I am seeing the same error, but I am using mpi4py (Lisandro Dalcin's
| Python MPI bindings).  I don't think that libmpi.so is being dlopen'd
| directly at runtime, but, the shared library that is linked at compile
| time to libmpi.so is probably being loaded at runtime.  The odd thing
| is that mpi4py has been tested extensively with openmpi and this is
| the first version of openmpi that we have seen this issue.  I tried
| 1.2.3 again yesterday and it works fine.  What changed with 1.2.4?
| 
| The problem with our case is that the code that is doing the dlopen is
| deep inside Python itself (not just mpi4py).  It is the same code that

That's the same for R. We don;t touch the innert guts of module loading for
this . What Hao realized after looking at the corresponding FAQ item was that
right before calling MPI_Init, one can load libmpi explicitly, and -- and
that;s the important bit -- set the proper RTLD_GLOBAL argument.  

So you could adapt the patch we used :

   a) add an include for dlfcn.h

   b) explicitly call dlopen on libmpi.so with RTLD_GLOBAL

That should be reasonably easy to test as you only need to rebuild mpi4py,


--- rmpi-0.5-4.orig/src/Rmpi.c
+++ rmpi-0.5-4/src/Rmpi.c
@@ -16,6 +16,7 @@
  */

 #include "Rmpi.h"
+#include 

 static MPI_Comm*comm;
 static MPI_Status *status;
@@ -32,7 +33,9 @@
 if (flag)
return AsInt(1);
else {  
-   MPI_Init((void *)0,(void *)0);
+   char *libm="libmpi.so";
+   dlopen(libm,RTLD_GLOBAL);
+   MPI_Init((void *)0,(void *)0);
MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
MPI_Errhandler_set(MPI_COMM_SELF, MPI_ERRORS_RETURN);
comm=(MPI_Comm *)Calloc(COMM_MAXSIZE, MPI_Comm); 


| is responsible for loading _everything_ into Python, and I am pretty
| sure that  there is no way that people would be willing to change it.
| I am cc'ing this to Lisandro - maybe he has some ideas on this front.

Actually, looked like you didn't CC him.

Hth, Dirk

| 
| Thanks
| 
| Brian
| 
| On 10/10/07, Brian Barrett <brbar...@open-mpi.org> wrote:
| > On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
| > > | Does this happen for all MPI programs (potentially only those that
| > > | use the MPI-2 one-sided stuff), or just your R environment?
| > >
| > > This is the likely winner.
| > >
| > > It seems indeed due to R's Rmpi package. Running a simple mpitest.c
| > > shows no
| > > error message. We will look at the Rmpi initialization to see what
| > > could
| > > cause this.
| >
| > Does rmpi link in libmpi.so or dynamically load it at run-time?  The
| > pt2pt one-sided component uses the MPI-1 point-to-point calls for
| > communication (hence, the pt2pt name). If those symbols were
| > unavailable (say, because libmpi.so was dynamically loaded) I could
| > see how this would cause problems.
| >
| > The pt2pt component (rightly) does not have a -lmpi in its link
| > line.  The other components that use symbols in libmpi.so (wrongly)
| > do  have a -lmpi in their link line.  This can cause some problems on
| > some platforms (Linux tends to do dynamic linking / dynamic loading
| > better than most).  That's why only the pt2pt component fails.
| >
| > My guess is that Rmpi is dynamically loading libmpi.so, but not
| > specifying the RTLD_GLOBAL flag.  This means that libmpi.so is not
| > available to the components the way it should be, and all goes
| > downhill from there.  It only mostly works because we do something
| > silly with how we link most of our components, and Linux is just
| > smart enough to cover our rears (thankfully).
| >
| > Solutions:
| >
| >- Someone could make the pt2pt osc component link in libmpi.so
| >  like the rest of the components and hope that no one ever
| >  tries this on a non-friendly platform.
| >- Debian (and all Rmpi users) could configure Open MPI with the
| >   --disable-dlopen flag and ignore the problem.
| >- Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
| >  flag and fix the problem properly.
| >
| > I think it's clear I'm in favor of Option 3.
| >
| > Brian
| > ___
| > users mailing list
| > us...@open-mpi.org
| > http://www.open-mpi.org/mailman/listinfo.cgi/users
| >
| ___
| users mailing list
| us...@open-mpi.org
| http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-10-10 Thread Dirk Eddelbuettel

Brian,

Man you're good!  :)

On 10 October 2007 at 13:49, Brian Barrett wrote:
| On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
| > | Does this happen for all MPI programs (potentially only those that
| > | use the MPI-2 one-sided stuff), or just your R environment?
| >
| > This is the likely winner.
| >
| > It seems indeed due to R's Rmpi package. Running a simple mpitest.c  
| > shows no
| > error message. We will look at the Rmpi initialization to see what  
| > could
| > cause this.
| 
| Does rmpi link in libmpi.so or dynamically load it at run-time?  The  

The extension mechanism for the GNU R environment loads at run-time. This is
used by literally hundreds of packages on the CRAN mirrors.

| pt2pt one-sided component uses the MPI-1 point-to-point calls for  
| communication (hence, the pt2pt name). If those symbols were  
| unavailable (say, because libmpi.so was dynamically loaded) I could  
| see how this would cause problems.
| 
| The pt2pt component (rightly) does not have a -lmpi in its link  
| line.  The other components that use symbols in libmpi.so (wrongly)  
| do  have a -lmpi in their link line.  This can cause some problems on  
| some platforms (Linux tends to do dynamic linking / dynamic loading  
| better than most).  That's why only the pt2pt component fails.
| 
| My guess is that Rmpi is dynamically loading libmpi.so, but not  
| specifying the RTLD_GLOBAL flag.  This means that libmpi.so is not  

Spot on. Hao, Rmpi's author, alerted me run the Open MPI FAQ item 24 and
suggested the following patch which appears to have solved the issue

--- rmpi-0.5-4.orig/src/Rmpi.c
+++ rmpi-0.5-4/src/Rmpi.c
@@ -16,6 +16,7 @@
  */

 #include "Rmpi.h"
+#include 

 static MPI_Comm*comm;
 static MPI_Status *status;
@@ -32,7 +33,9 @@
 if (flag)
return AsInt(1);
else {  
-   MPI_Init((void *)0,(void *)0);
+   char *libm="libmpi.so";
+   dlopen(libm,RTLD_GLOBAL);
+   MPI_Init((void *)0,(void *)0);
MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
MPI_Errhandler_set(MPI_COMM_SELF, MPI_ERRORS_RETURN);
comm=(MPI_Comm *)Calloc(COMM_MAXSIZE, MPI_Comm); 

| available to the components the way it should be, and all goes  
| downhill from there.  It only mostly works because we do something  
| silly with how we link most of our components, and Linux is just  
| smart enough to cover our rears (thankfully).
| 
| Solutions:
| 
|- Someone could make the pt2pt osc component link in libmpi.so
|  like the rest of the components and hope that no one ever
|  tries this on a non-friendly platform.
|- Debian (and all Rmpi users) could configure Open MPI with the
|   --disable-dlopen flag and ignore the problem.
|- Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
|  flag and fix the problem properly.
| 
| I think it's clear I'm in favor of Option 3.

And I think Rmpi's autor agrees with you :) This also more or less answers
the question I lobbed at Hao a few minutes ago when I was puzzled why Open
MPI needs when so many other packages / libraries load cleanly into R.

Many, many thanks!

Dirk, much happier

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-10-10 Thread Dirk Eddelbuettel

Jeff,

Thanks for the reply.  I have gotten much closer, and it looks like all
wounds were self-inflicted.  More below.

On 9 October 2007 at 22:01, Jeff Squyres wrote:
| On Oct 9, 2007, at 3:50 PM, Dirk Eddelbuettel wrote:
| 
| > edd@ron:~$ orterun -n 2 --mca mca_component_show_load_errors 1 r -e  
| > 'library(Rmpi); print(mpi.comm.rank(0))'
| > [ron:18360] mca: base: component_find: unable to open osc pt2pt:  
| > file not found (ignored)
| > [ron:18361] mca: base: component_find: unable to open osc pt2pt:  
| > file not found (ignored)
| 
| Truly odd.  Looking in the code, this error message is displayed when  
| lt_dlopen() of the component fails for some reason (the Libtool  
| portable wrapper library around dlopen() and friends).  We print out  
| the error string that libltdl returns to us, and it's apparently  
| "file not found".  This *usually* refers to the fact that a  
| dependency of the DSO that we're trying to open wasn't found (not  
| that the DSO itself wasn't found).
| 
| Your list of ldd dependencies didn't show anything odd, so I can't  
| imagine why it would get a "file not found" kind of error.
| 
| An off the wall question: are you compiling / building Open MPI on  
| one system and running it on another, where perhaps the dependencies  
| are slightly different and therefore causing a failure?  This is a  
| pretty weak question to ask, because I assume that *many* OMPI  
| components would fail to open if this were the case, but I thought  
| I'd ask anyway...

It's a fair question, but the Debian dependencies are usually good enough.  [
The answer is 'yes and no' as I build what gets onto Debian's mirrors, but
using a standardised chroot whereas I then run it on my normal system. So the
the same-yet-different machine. And there can be differences, but this is
typically caught by the package management layer. ]

| Another whacky question: does the error happen when you start your  
| test program manually (without mpirun)?

That made no difference.

| Does this happen for all MPI programs (potentially only those that  
| use the MPI-2 one-sided stuff), or just your R environment?

This is the likely winner. 

It seems indeed due to R's Rmpi package. Running a simple mpitest.c shows no
error message. We will look at the Rmpi initialization to see what could
cause this.

| At this point, all I can suggest is firing up a debugger and stepping  
| through the code in ld_dlopenext() to see why exactly it is failing.   

Seems like I avoided that trip to the dentist. ;-)

Moreover, despite my attempts at checking and double checking, my apparent
'works on Debian but not on Ubuntu' was due to a LAM / OpenMPI mix on my
Ubuntu machine at work.  Sorry, that was another false alarm.

| Sorry I don't have a better suggestion than this...  :-\

You were spot-on and most helpful. Thanks a bunch.

Cheers, Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-10-09 Thread Dirk Eddelbuettel

On 9 October 2007 at 08:08, Jeff Squyres wrote:
| On Oct 7, 2007, at 12:53 AM, Dirk Eddelbuettel wrote:
| 
| > | Not that I can tell.  What else could I test?  The build-logs  
| > don't reveal
| > | anything fishy -- all pt2pt occurrences look fine.  The build itself
| > | proceeded fine (and this was the Debian package build I then uplod)
| >
| > Two more observations:
| > -- the message does not appear on my Ubuntu system
| > -- but it appears on Hao's Debian machine which does not use the  
| > Debian package
| >
| > Could this be some dynamic loading issue?  How can we go about  
| > solving this?
| 
| I'm disconnected from the network for the moment and can't test a  
| tarball build myself (i.e., I don't have ready access to a  
| distribution tarball), but I think that we disable showing dlopen  
| errors for optimized/tarball builds.  Try running with this MCA  
| parameter:
| 
|mpriun --mca mca_component_show_load_errors 1 ...

Does not reveal much. Using my Debian system:

orterun -n 2 --mca mca_component_show_load_errors 1 r -e 'library(Rmpi);
print(mpi.comm.rank(0))'

edd@ron:~$ orterun -n 2 --mca mca_component_show_load_errors 1 r -e 
'library(Rmpi); print(mpi.comm.rank(0))'
[ron:18360] mca: base: component_find: unable to open osc pt2pt: file not found 
(ignored)
[ron:18361] mca: base: component_find: unable to open osc pt2pt: file not found 
(ignored)
[1] 0
[1] 1
edd@ron:~$ 

and using my Ubuntu system -- same code, same compile options -- 

foo:~> orterun -n 2 --mca mca_component_show_load_errors 1 r -e
'library(Rmpi); print(mpi.comm.rank(0))'
[1] 0
[1] 1
foo:~>


Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-10-09 Thread Dirk Eddelbuettel

On 8 October 2007 at 22:06, Brian Granger wrote:
| Also seeing this problem on Fedora Core 5.  Any resolution yet?

No, none.  With the exact same configuration (encoded in the Debian package
build 'recipe' for the package), I get 

-- on Debian:  'unable to open osc pt2pt' verbosity but a working Open MPI setup

-- on Ubuntu:  no verbosity, but Open MPI hangs

Very puzzling.  I have rebuilt many other Debian packages on my Ubuntu
systems and yet to see any divergence or regression in behaviour.  

Needless to say, I'd like to get this work, but I do not know what to try next.

Dirk



| Brian
| 
| On 10/6/07, Dirk Eddelbuettel <e...@debian.org> wrote:
| >
| > On 6 October 2007 at 09:36, Dirk Eddelbuettel wrote:
| > |
| > | On 5 October 2007 at 21:31, Brian Barrett wrote:
| > | | On Oct 5, 2007, at 8:48 PM, Dirk Eddelbuettel wrote:
| > | |
| > | | > With the (Debian package of the) current 1.2.4 release, I am seeing
| > | | > a lot of
| > | | >
| > | | >   mca: base: component_find: unable to open osc pt2pt: file not
| > | | > found (ignored)
| > | | >
| > | | > that I'd like to suppress.
| > | | >
| > | | > For these Debian packages, we added a (commented-out by default)
| > | | > entry to
| > | | > suppress the Infiniband noise when no Infiniband hardware is to be
| > | | > found. I
| > | | > would like to suppress this 'osc pt2pt' message too.
| > | | >
| > | | > But all attempts at guestimating parameters for
| > | | >   /etc/openmpi/openmpi-mca-params.conf
| > | | > based on what
| > | | >   ompi_info all all
| > | | > shows failed.  Could someone help me along?
| > | |
| > | | This is a bit different, and points to something bad going on.  The
| > | | error message is that for some reason, a library that the pt2pt
| > | | component depends on was not found.  The pt2pt osc component is
| > |
| > | Uh-oh. Doesn't sound good.
| > |
| > | | entirely built on the MPI layer -- it shouldn't have any external
| > | | dependencies.  Can you run ldd on the library and see if there's
| > | | anything obvious?
| > |
| > | edd@ron:~> ldd /usr/lib/openmpi/mca_osc_pt2pt.so
| > | libnsl.so.1 => /lib/i686/cmov/libnsl.so.1 (0xb7f9a000)
| > | libutil.so.1 => /lib/i686/cmov/libutil.so.1 (0xb7f96000)
| > | libm.so.6 => /lib/i686/cmov/libm.so.6 (0xb7f7)
| > | libpthread.so.0 => /lib/i686/cmov/libpthread.so.0 (0xb7f59000)
| > | libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7e11000)
| > | /lib/ld-linux.so.2 (0x8000)
| > |
| > | Not that I can tell.  What else could I test?  The build-logs don't reveal
| > | anything fishy -- all pt2pt occurrences look fine.  The build itself
| > | proceeded fine (and this was the Debian package build I then uplod)
| >
| > Two more observations:
| > -- the message does not appear on my Ubuntu system
| > -- but it appears on Hao's Debian machine which does not use the Debian 
package
| >
| > Could this be some dynamic loading issue?  How can we go about solving this?
| >
| > Dirk
| >
| > --
| > Three out of two people have difficulties with fractions.
| > ___
| > users mailing list
| > us...@open-mpi.org
| > http://www.open-mpi.org/mailman/listinfo.cgi/users
| >
| ___
| users mailing list
| us...@open-mpi.org
| http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-10-06 Thread Dirk Eddelbuettel

On 5 October 2007 at 21:31, Brian Barrett wrote:
| On Oct 5, 2007, at 8:48 PM, Dirk Eddelbuettel wrote:
| 
| > With the (Debian package of the) current 1.2.4 release, I am seeing  
| > a lot of
| >
| >   mca: base: component_find: unable to open osc pt2pt: file not  
| > found (ignored)
| >
| > that I'd like to suppress.
| >
| > For these Debian packages, we added a (commented-out by default)  
| > entry to
| > suppress the Infiniband noise when no Infiniband hardware is to be  
| > found. I
| > would like to suppress this 'osc pt2pt' message too.
| >
| > But all attempts at guestimating parameters for
| >   /etc/openmpi/openmpi-mca-params.conf
| > based on what
| >   ompi_info all all
| > shows failed.  Could someone help me along?
| 
| This is a bit different, and points to something bad going on.  The  
| error message is that for some reason, a library that the pt2pt  
| component depends on was not found.  The pt2pt osc component is  

Uh-oh. Doesn't sound good.

| entirely built on the MPI layer -- it shouldn't have any external  
| dependencies.  Can you run ldd on the library and see if there's  
| anything obvious?

edd@ron:~> ldd /usr/lib/openmpi/mca_osc_pt2pt.so
libnsl.so.1 => /lib/i686/cmov/libnsl.so.1 (0xb7f9a000)
libutil.so.1 => /lib/i686/cmov/libutil.so.1 (0xb7f96000)
libm.so.6 => /lib/i686/cmov/libm.so.6 (0xb7f7)
libpthread.so.0 => /lib/i686/cmov/libpthread.so.0 (0xb7f59000)
libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7e11000)
/lib/ld-linux.so.2 (0x8000)

Not that I can tell.  What else could I test?  The build-logs don't reveal
anything fishy -- all pt2pt occurrences look fine.  The build itself
proceeded fine (and this was the Debian package build I then uplod)

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI v1.2.4 released

2007-09-27 Thread Dirk Eddelbuettel

On 26 September 2007 at 13:37, Francesco Pietra wrote:
| Are any detailed directions for upgrading (for common guys, not experts, I
| mean)? My 1.2.3 version on Debian Linux amd64 runs perfectly.

How about

sudo apt-get update; sudo apt-get dist-upgrade

provided you point to Debian unstable which got 1.2.4 yesterday; ports for
alpha, amd64, ia64, powerpc are already available too.

Dirk
part of Debian's pkg-openmpi team

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] orte_pls_base_select fails

2007-07-19 Thread Dirk Eddelbuettel

On 18 July 2007 at 19:14, Dirk Eddelbuettel wrote:
| 
| Hi Tim,
| 
| Thanks for the follow-up
| 
| On 18 July 2007 at 17:22, Tim Prins wrote:
| | 
| | > Yes, this helps tremendously.  I installed rsh, and now it pretty much
| | > works.
| | Glad this worked out for you.
| | 
| | >
| | > The one missing detail is that I can't seem to get the stdout/stderr
| | > output.  For example:
| | >
| | > $ orterun -np 1 uptime
| | > $ uptime
| | > 18:24:27 up 13 days,  3:03,  0 users,  load average: 0.00, 0.03, 0.00
| | >
| | > The man page indicates that stdout/stderr is supposed to come back to
| | > the stdout/stderr of the orterun process.  Any ideas on why this isn't
| | > working?
| | It should work. However, we currently have some I/O forwarding problems 
which 
| | show up in some environments that will (hopefully) be fixed in the next 
| | release. As far as I know, the problem seems to happen mostly with non-mpi 
| | applications.
| | 
| | Try running a simple mpi application, such as:
| | 
| | #include 
| | #include "mpi.h"
| | 
| | int main(int argc, char* argv[])
| | {
| | int rank, size;
| | 
| | MPI_Init(, );
| | MPI_Comm_rank(MPI_COMM_WORLD, );
| | MPI_Comm_size(MPI_COMM_WORLD, );
| | printf("Hello, world, I am %d of %d\n", rank, size);
| | MPI_Finalize();
| | 
| | return 0;
| | }
| | 
| | If that works fine, then it is probably our problem, and not a problem with 
| | your setup.
| | 
| | Sorry I don't have a better answer :(
| 
| That works (and I use the same Debian openmpi 1.2.3-1 set of packages Adam
| has): 
| 
| edd@basebud:~> opalcc -o /tmp/openmpitest /tmp/openmpitest.c -lmpi
| edd@basebud:~> orterun -np 4 /tmp/openmpitest
| Hello, world, I am 2 of 4
| Hello, world, I am 1 of 4
| Hello, world, I am 0 of 4
| Hello, world, I am 3 of 4
| edd@basebud:~>
| 
| I was toying with this at work earlier, and it was hanging there (using
| hostname or uptime as the token binaries) as soon as I increased the np
| parameter beyond 1. 
| 
| It works here:
| 
| edd@basebud:~> orterun -np 4 hostname
| basebud
| basebud
| basebud
| basebud
| edd@basebud:~>
| 
| I have slurm-llnl test packages installed at work but not here. Maybe I need
| to a dig a bit more into slurm.  (Adam: slurm package should be forthcoming.
| I can point you to the snapshots from the fellow whom I mentor on this.)

Indeed, at work it hangs once it up the np parameter:

foo:~> orterun -np 4 ./openmpitest
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4
orterun: killing job...

Killed
foo:~> orterun -np 4 -H localhost ./openmpitest
Hello, world, I am 1 of 4
Hello, world, I am 0 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4
foo:~> 

Restricting it to localhost helps.  Any ideas?

x86 multicore/multicpu, Open MPI 1.2.3, Slurm 1.2.11, Ubuntu 7.04 plus a
handful of handcompiled packages from Debian unstable. More details available
just tell what is needed and how best to compile it.

Dirk

-- 
Hell, there are no rules here - we're trying to accomplish something. 
  -- Thomas A. Edison