Re: [OMPI users] OpenMPI, debugging, and Portland Group's pgdbg

2006-06-16 Thread Jeff Squyres (jsquyres)
I'm afraid that I'm not familiar with the PG debugger, so I don't know
how it is supposed to be launched.

The intent with --debugger / --debug is that you could do a single
invocation of some command and it launches both the parallel debugger
and tells that debugger to launch your parallel MPI process (assumedly
allowing the parallel debugger to attach to your parallel MPI process).
This is what fx2 and Totalview allow, for example.  

As such, the "--debug" option is simply syntactic sugar for invoking
another [perhaps non-obvious] command.  We figured it was simpler for
users to add "--debug" to the already-familiar mpirun command line than
to learn a new syntax for invoking a debugger (although both would
certainly work equally well).

As such, when OMPI's mpirun sees "--debug", it ends up exec'ing
something else -- the parallel debugger command.  In the example that I
gave in http://www.open-mpi.org/community/lists/users/2005/11/0370.php,
mpirun looked for two things in your path: totalview and fx2.

For example, if you did this:

mpirun --debug -np 4 a.out

If it found totalview, it would end up exec'ing:

totalview @mpirun@ -a @mpirun_args@ 
which would get substituted to
totalview mpirun -a -np 4 a.out

(note the additional "-a") Which is the totalview command line syntax to
launch their debugger and tell it to launch your parallel process.  If
totalview is not found in your path, it'll look for fx2.  If fx2 is
found, it'll invoke:

fx2 @mpirun@ -a @mpirun_args@ 
which would get substitued to
fx2 mpirun -a -np 4 a.out

You can see that fx2's syntax was probably influenced by totalview's.  

So what you need is the command line that tells pgdbg to do the same
thing -- launch your app and attach to it.  You can then substitute that
into the "--debugger" option (using the @mpirun@ and @mpirun_args@
tokens), or set the MCA parameter "orte_base_user_debugger", and then
use --debug.  For example, if the pgdbg syntax is similar to that of
totalview and fx2, then you could do the following:

mpirun --debugger pgdbg @mpirun@ -a @mpirun_args@ --debug -np 4
a.out
or (assuming tcsh)
shell% setenv OMPI_MCA_orte_base_user_debugger "pgdbg @mpirun@
-a @mpirun_args@"
shell% mpirun --debug -np 4 a.out

Make sense?

If you find a fixed format for pgdb, we'd be happy to add it to the
default value of the orte_base_user_debugger MCA parameter.

Note that OMPI currently only supports the Totalview API for attaching
to MPI processes -- I don't know if pgdbg requires something else.


> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Caird, Andrew J
> Sent: Tuesday, June 13, 2006 4:38 PM
> To: us...@open-mpi.org
> Subject: [OMPI users] OpenMPI, debugging, and Portland Group's pgdbg
> 
> Hello all,
> 
> I've read the thread "OpenMPI debugging support"
> (http://www.open-mpi.org/community/lists/users/2005/11/0370.ph
> p) and it
> looks like there is improved debugging support for debuggers 
> other than
> TV in the 1.1 series.
> 
> I'd like to use Portland Groups pgdbg.  It's a parallel debugger,
> there's more information at http://www.pgroup.com/resources/docs.htm.
> 
> >From the previous thread on this topic, it looks to me like 
> the plan for
> 1.1 and forward is to support the ability to launch the 
> debugger "along
> side" the application.  I don't know enough about either pgdbg or
> OpenMPI to know if this is the best plan, but assuming that it is, is
> there a way to see if it is happening?
> 
> I've tried this two ways, the first way doesn't seem to attach to
> anything:
> --
> --
> 
> [acaird@nyx-login ~]$ ompi_info | head -2
> Open MPI: 1.1a9r10177
>Open MPI SVN revision: r10177
> [acaird@nyx-login ~]$ mpirun --debugger pgdbg --debug  -np 2 cpi
> PGDBG 6.1-3 x86-64 (Cluster, 64 CPU)
> Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
> Copyright 2000-2005, STMicroelectronics, Inc. All Rights Reserved.
> PGDBG cannot open a window; check the DISPLAY environment variable.
> Entering text mode.
> 
> pgdbg> list
> ERROR: No current thread.
> 
> pgdbg> quit
> --
> --
> 
> 
> and I've tried running the whole thing under pgdbg:
> --
> --
> 
> [acaird@nyx-login ~]$ pgdbg mpirun -np 2 cpi -s pgdbgscript
>   { lots of mca_* loaded by ld-linux messages }
> pgserv 8726: attach : attach 8720 fails
> ERROR: New Process (PID 8720, HOST localhost) ATTACH FAILED.
> ERROR: New Process (PID 8720, HOST localhost) IGNORED.
> ERROR: cannot read value at address 0x59BFE8.
> ERROR: cannot read value at address 0x59BFF0.
> ERROR: cannot read value at address 0x59BFF8.
> ERROR: New Process (PID 0, HOST unknown) IGNORED.
> ERROR: cannot read value at address 

Re: [OMPI users] pls:rsh: execv failed with errno=2

2006-06-16 Thread Jeff Squyres (jsquyres)
Sorry for jumping in late...

The /lib vs. /lib64 thing as part of --prefix was definitely broken until 
recently.  This behavior has been fixed in the 1.1 series.  Specifically, OMPI 
will take the prefix that you provided and append the basename of the local 
$libdir.  So if you configured OMPI with something like:

 shell$ ./configure --libdir=/some/path/lib64 ...

And then you run:

 shell$ mpirun --prefix /some/path ...

Then OMPI will add /some/path/lib64 to the remote LD_LIBRARY_PATH.  The 
previous behavior would always add "/lib" to the remote LD_LIBRARY_PATH, 
regardless of what the local $libdir was (i.e., it ignored the basename of your 
$libdir).  

If you have a situation more complicated than this (e.g., your $libdir is 
different than your prefix by more than just the basename), then --prefix is 
not the solution for you.  Instead, you'll need to set your $PATH and 
$LD_LIBRARY_PATH properly on all nodes (e.g., in your shell startup files). 
Specifically, --prefix is meant to be an easy workaround for common 
configurations where $libdir is a subdirectory under $prefix.

Another random note: invoking mpirun with an absolute path (e.g., 
/path/to/bin/mpirun) is exactly the same as specifying --prefix /path/to -- so 
you don't have to do both.


> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Eric Thibodeau
> Sent: Friday, June 16, 2006 11:47 AM
> To: pak@sun.com; Open MPI Users
> Subject: Re: [OMPI users] pls:rsh: execv failed with errno=2
> 
> Thanks for pointing out the LD_LIBRARY_PATH64 ...that 
> explains much. As for the original error, I am still "a duck 
> out of watter". I will try the 1.1rxxx trunck though 
> (creating an ebuild for it as we speak)
> 
> Eric
> 
> Le vendredi 16 juin 2006 11:44, Pak Lui a écrit :
> > Hi Eric,
> > 
> > I started to see what you are saying. You tried to point 
> out that you 
> > are using the libdir to lib64 instead of just lib and 
> somehow it doesn't 
> > get picked up.
> > 
> > I personally have not tried this option though, so I don't 
> think I can 
> > help you much here. But I saw that there are changes in the rsh pls 
> > module for the trunk and 1.1 versions (r9930, 9931, 10207, 
> 10214) that 
> > may solve your lib64 issue. If you do ldd on a.out, it'd show the 
> > libraries it linked to. Other than that, setting should the 
> > LD_LIBRARY_PATH64 shouldn't make a different either.
> > 
> > I am not sure if others can help you on this.
> > 
> > Eric Thibodeau wrote:
> > > Hello,
> > > 
> > > I don't want to get too much off topic in this reply but 
> you're brigning 
> > > out a point here. I am unable to run mpi apps on the 
> AMD64 platform with 
> > > the regular exporting of $LD_LIBRARY_PATH and $PATH, this 
> is why I have 
> > > no choice but to revert to using the --prefix approach. 
> Here are a few 
> > > execution examples to demonstrate my point:
> > > 
> > > kyron@headless ~ $ 
> /usr/lib64/openmpi/1.0.2-gcc-4.1/bin/mpirun --prefix 
> > > /usr/lib64/openmpi/1.0.2-gcc-4.1/ -np 2 ./a.out
> > > 
> > > ./a.out: error while loading shared libraries: 
> libmpi.so.0: cannot open 
> > > shared object file: No such file or directory
> > > 
> > > kyron@headless ~ $ 
> /usr/lib64/openmpi/1.0.2-gcc-4.1/bin/mpirun --prefix 
> > > /usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/ -np 2 ./a.out
> > > 
> > > [headless:10827] pls:rsh: execv failed with errno=2
> > > 
> > > [headless:10827] ERROR: A daemon on node localhost failed 
> to start as 
> > > expected.
> > > 
> > > [headless:10827] ERROR: There may be more information 
> available from
> > > 
> > > [headless:10827] ERROR: the remote shell (see above).
> > > 
> > > [headless:10827] ERROR: The daemon exited unexpectedly 
> with status 255.
> > > 
> > > kyron@headless ~ $ cat opmpi64.sh
> > > 
> > > #!/bin/bash
> > > 
> > > MPI_BASE='/usr/lib64/openmpi/1.0.2-gcc-4.1'
> > > 
> > > export PATH=$PATH:${MPI_BASE}/bin
> > > 
> > > LD_LIBRARY_PATH=${MPI_BASE}/lib64
> > > 
> > > kyron@headless ~ $ . opmpi64.sh
> > > 
> > > kyron@headless ~ $ mpirun -np 2 ./a.out
> > > 
> > > ./a.out: error while loading shared libraries: 
> libmpi.so.0: cannot open 
> > > shared object file: No such file or directory
> > > 
> > > kyron@headless ~ $
> > > 
> > > Eric
> > > 
> > > Le vendredi 16 juin 2006 10:31, Pak Lui a écrit :
> > > 
> > >  > Hi, I noticed your prefix set to the lib dir, can you 
> try without the
> > > 
> > >  > lib64 part and rerun?
> > > 
> > >  >
> > > 
> > >  > Eric Thibodeau wrote:
> > > 
> > >  > > Hello everyone,
> > > 
> > >  > >
> > > 
> > >  > > Well, first off, I hope this problem I am reporting 
> is of some 
> > > validity,
> > > 
> > >  > > I tried finding simmilar situations off Google and 
> the mailing list 
> > > but
> > > 
> > >  > > came up with only one reference [1] which seems 
> invalid in my case 
> > > since
> > > 
> > >  > > all executions are local (naïve assumptions that it 
> makes a difference
> > > 

Re: [OMPI users] pls:rsh: execv failed with errno=2

2006-06-16 Thread Pak Lui

Hi Eric,

I started to see what you are saying. You tried to point out that you 
are using the libdir to lib64 instead of just lib and somehow it doesn't 
get picked up.


I personally have not tried this option though, so I don't think I can 
help you much here. But I saw that there are changes in the rsh pls 
module for the trunk and 1.1 versions (r9930, 9931, 10207, 10214) that 
may solve your lib64 issue. If you do ldd on a.out, it'd show the 
libraries it linked to. Other than that, setting should the 
LD_LIBRARY_PATH64 shouldn't make a different either.


I am not sure if others can help you on this.

Eric Thibodeau wrote:

Hello,

I don't want to get too much off topic in this reply but you're brigning 
out a point here. I am unable to run mpi apps on the AMD64 platform with 
the regular exporting of $LD_LIBRARY_PATH and $PATH, this is why I have 
no choice but to revert to using the --prefix approach. Here are a few 
execution examples to demonstrate my point:


kyron@headless ~ $ /usr/lib64/openmpi/1.0.2-gcc-4.1/bin/mpirun --prefix 
/usr/lib64/openmpi/1.0.2-gcc-4.1/ -np 2 ./a.out


./a.out: error while loading shared libraries: libmpi.so.0: cannot open 
shared object file: No such file or directory


kyron@headless ~ $ /usr/lib64/openmpi/1.0.2-gcc-4.1/bin/mpirun --prefix 
/usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/ -np 2 ./a.out


[headless:10827] pls:rsh: execv failed with errno=2

[headless:10827] ERROR: A daemon on node localhost failed to start as 
expected.


[headless:10827] ERROR: There may be more information available from

[headless:10827] ERROR: the remote shell (see above).

[headless:10827] ERROR: The daemon exited unexpectedly with status 255.

kyron@headless ~ $ cat opmpi64.sh

#!/bin/bash

MPI_BASE='/usr/lib64/openmpi/1.0.2-gcc-4.1'

export PATH=$PATH:${MPI_BASE}/bin

LD_LIBRARY_PATH=${MPI_BASE}/lib64

kyron@headless ~ $ . opmpi64.sh

kyron@headless ~ $ mpirun -np 2 ./a.out

./a.out: error while loading shared libraries: libmpi.so.0: cannot open 
shared object file: No such file or directory


kyron@headless ~ $

Eric

Le vendredi 16 juin 2006 10:31, Pak Lui a écrit :

 > Hi, I noticed your prefix set to the lib dir, can you try without the

 > lib64 part and rerun?

 >

 > Eric Thibodeau wrote:

 > > Hello everyone,

 > >

 > > Well, first off, I hope this problem I am reporting is of some 
validity,


 > > I tried finding simmilar situations off Google and the mailing list 
but


 > > came up with only one reference [1] which seems invalid in my case 
since


 > > all executions are local (naïve assumptions that it makes a difference

 > > on the calling stack). I am trying to run asimple HelloWorld using

 > > OpenMPI 1.0.2 on an AMD64 machine and a Sun Enterprise (12 procs)

 > > machine. In both cases I get the following error:

 > >

 > > pls:rsh: execv failed with errno=2

 > >

 > > Here is the mpirun -d trace when running my HelloWorld (on AMD64):

 > >

 > > kyron@headless ~ $ mpirun -d --prefix

 > > /usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/ -np 4 ./hello

 > >

 > > [headless:10461] procdir: (null)

 > >

 > > [headless:10461] jobdir: (null)

 > >

 > > [headless:10461] unidir:

 > > /tmp/openmpi-sessions-kyron@headless_0/default-universe


 >



 > [headless:10461] top: openmpi-sessions-kyron@headless_0



 >



 > [headless:10461] tmp: /tmp



 >



 > [headless:10461] [0,0,0] setting up session dir with



 >



 > [headless:10461] tmpdir /tmp



 >



 > [headless:10461] universe default-universe-10461



 >



 > [headless:10461] user kyron



 >



 > [headless:10461] host headless



 >



 > [headless:10461] jobid 0



 >



 > [headless:10461] procid 0



 >



 > [headless:10461] procdir:



 > /tmp/openmpi-sessions-kyron@headless_0/default-universe-10461/0/0



 >



 > [headless:10461] jobdir:



 > /tmp/openmpi-sessions-kyron@headless_0/default-universe-10461/0



 >



 > [headless:10461] unidir:



 > /tmp/openmpi-sessions-kyron@headless_0/default-universe-10461



 >



 > [headless:10461] top: openmpi-sessions-kyron@headless_0



 >



 > [headless:10461] tmp: /tmp



 >



 > [headless:10461] [0,0,0] contact_file


 > 

/tmp/openmpi-sessions-kyron@headless_0/default-universe-10461/universe-setup.txt


 >



 > [headless:10461] [0,0,0] wrote setup file



 >



 > [headless:10461] spawn: in job_state_callback(jobid = 1, state = 0x1)



 >



 > [headless:10461] pls:rsh: local csh: 0, local bash: 1



 >



 > [headless:10461] pls:rsh: assuming same remote shell as local shell



 >



 > [headless:10461] pls:rsh: remote csh: 0, remote bash: 1



 >



 > [headless:10461] pls:rsh: final template argv:



 >



 > [headless:10461] pls:rsh: /usr/bin/ssh  orted --debug



 > --bootproxy 1 --name  --num_procs 2 --vpid_start 0 --nodename



 >  --universe kyron@headless:default-universe-10461 --nsreplica


 > "0.0.0;tcp://142.137.135.124:37657;tcp://192.168.1.1:37657" 

--gprreplica


 > "0.0.0;tcp://142.137.135.124:37657;tcp://192.168.1.1:37657"



 > 

Re: [OMPI users] pls:rsh: execv failed with errno=2

2006-06-16 Thread Eric Thibodeau
Hello,
I don't want to get too much off topic in this reply but you're 
brigning out a point here. I am unable to run mpi apps on the AMD64 platform 
with the regular exporting of $LD_LIBRARY_PATH and $PATH, this is why I have no 
choice but to revert to using the --prefix approach. Here are a few execution 
examples to demonstrate my point:

kyron@headless ~ $ /usr/lib64/openmpi/1.0.2-gcc-4.1/bin/mpirun --prefix 
/usr/lib64/openmpi/1.0.2-gcc-4.1/ -np 2 ./a.out
./a.out: error while loading shared libraries: libmpi.so.0: cannot open shared 
object file: No such file or directory
kyron@headless ~ $ /usr/lib64/openmpi/1.0.2-gcc-4.1/bin/mpirun --prefix 
/usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/ -np 2 ./a.out
[headless:10827] pls:rsh: execv failed with errno=2
[headless:10827] ERROR: A daemon on node localhost failed to start as expected.
[headless:10827] ERROR: There may be more information available from
[headless:10827] ERROR: the remote shell (see above).
[headless:10827] ERROR: The daemon exited unexpectedly with status 255.
kyron@headless ~ $ cat opmpi64.sh
#!/bin/bash
MPI_BASE='/usr/lib64/openmpi/1.0.2-gcc-4.1'
export PATH=$PATH:${MPI_BASE}/bin
LD_LIBRARY_PATH=${MPI_BASE}/lib64
kyron@headless ~ $ . opmpi64.sh
kyron@headless ~ $ mpirun -np 2 ./a.out
./a.out: error while loading shared libraries: libmpi.so.0: cannot open shared 
object file: No such file or directory
kyron@headless ~ $

Eric

Le vendredi 16 juin 2006 10:31, Pak Lui a écrit :
> Hi, I noticed your prefix set to the lib dir, can you try without the 
> lib64 part and rerun?
> 
> Eric Thibodeau wrote:
> > Hello everyone,
> > 
> > Well, first off, I hope this problem I am reporting is of some validity, 
> > I tried finding simmilar situations off Google and the mailing list but 
> > came up with only one reference [1] which seems invalid in my case since 
> > all executions are local (naïve assumptions that it makes a difference 
> > on the calling stack). I am trying to run asimple HelloWorld using 
> > OpenMPI 1.0.2 on an AMD64 machine and a Sun Enterprise (12 procs) 
> > machine. In both cases I get the following error:
> > 
> > pls:rsh: execv failed with errno=2
> > 
> > Here is the mpirun -d trace when running my HelloWorld (on AMD64):
> > 
> > kyron@headless ~ $ mpirun -d --prefix 
> > /usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/ -np 4 ./hello
> > 
> > [headless:10461] procdir: (null)
> > 
> > [headless:10461] jobdir: (null)
> > 
> > [headless:10461] unidir: 
> > /tmp/openmpi-sessions-kyron@headless_0/default-universe
> > 
> > [headless:10461] top: openmpi-sessions-kyron@headless_0
> > 
> > [headless:10461] tmp: /tmp
> > 
> > [headless:10461] [0,0,0] setting up session dir with
> > 
> > [headless:10461] tmpdir /tmp
> > 
> > [headless:10461] universe default-universe-10461
> > 
> > [headless:10461] user kyron
> > 
> > [headless:10461] host headless
> > 
> > [headless:10461] jobid 0
> > 
> > [headless:10461] procid 0
> > 
> > [headless:10461] procdir: 
> > /tmp/openmpi-sessions-kyron@headless_0/default-universe-10461/0/0
> > 
> > [headless:10461] jobdir: 
> > /tmp/openmpi-sessions-kyron@headless_0/default-universe-10461/0
> > 
> > [headless:10461] unidir: 
> > /tmp/openmpi-sessions-kyron@headless_0/default-universe-10461
> > 
> > [headless:10461] top: openmpi-sessions-kyron@headless_0
> > 
> > [headless:10461] tmp: /tmp
> > 
> > [headless:10461] [0,0,0] contact_file 
> > /tmp/openmpi-sessions-kyron@headless_0/default-universe-10461/universe-setup.txt
> > 
> > [headless:10461] [0,0,0] wrote setup file
> > 
> > [headless:10461] spawn: in job_state_callback(jobid = 1, state = 0x1)
> > 
> > [headless:10461] pls:rsh: local csh: 0, local bash: 1
> > 
> > [headless:10461] pls:rsh: assuming same remote shell as local shell
> > 
> > [headless:10461] pls:rsh: remote csh: 0, remote bash: 1
> > 
> > [headless:10461] pls:rsh: final template argv:
> > 
> > [headless:10461] pls:rsh: /usr/bin/ssh  orted --debug 
> > --bootproxy 1 --name  --num_procs 2 --vpid_start 0 --nodename 
> >  --universe kyron@headless:default-universe-10461 --nsreplica 
> > "0.0.0;tcp://142.137.135.124:37657;tcp://192.168.1.1:37657" --gprreplica 
> > "0.0.0;tcp://142.137.135.124:37657;tcp://192.168.1.1:37657" 
> > --mpi-call-yield 0
> > 
> > [headless:10461] pls:rsh: launching on node localhost
> > 
> > [headless:10461] pls:rsh: oversubscribed -- setting mpi_yield_when_idle 
> > to 1 (1 4)
> > 
> > [headless:10461] pls:rsh: localhost is a LOCAL node
> > 
> > [headless:10461] pls:rsh: reset PATH: 
> > /usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/bin:/usr/local/bin:/usr/bin:/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.1:/opt/c3-4:/usr/qt/3/bin:/usr/lib64/openmpi/1.0.2-gcc-4.1/bin
> > 
> > [headless:10461] pls:rsh: reset LD_LIBRARY_PATH: 
> > /usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/lib
> > 
> > [headless:10461] pls:rsh: changing to directory /home/kyron
> > 
> > [headless:10461] pls:rsh: executing: orted --debug --bootproxy 1 --name 
> > 0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost 

Re: [OMPI users] Openmpi 1.0.3svn10374 not launching apps throughTMinterface

2006-06-16 Thread Martin Schafföner
On Friday 16 June 2006 16:03, Jeff Squyres (jsquyres) wrote:
> Can you repeat this and run a non-MPI executable such as "hostname"?  I
> want to take MPI out of the equation and just test the launching system.

Sorry, forgot that. So I executed 

mpiexec -np 1 -d --mca pls tm --mca pls_tm_debug 1 -x 
PATH --prefix /opt/openmpi hostname > openmpi1.log 2>&1

and the log is attached.

> Can you also verify that you are correctly matching your OMPI installations
> on all nodes?  I.e., that you've got Open MPI installed in the same
> location on all nodes in your cluster, and your PATH and LD_LIBRARY_PATH
> are pointing to the version of Open MPI that you intend to use.

Well, all the nodes share the installation through NFS. The PATH is entered 
into $HOME/.bashrc, but I also export the PATH variable as you can see above. 
LD_LIBRARY_PATH is not necessary as openmpi was only built with static 
libraries and libtorque.so can be found through /etc/ld.so.cache on all 
nodes.

> > (could it be
> > that the mailing list has problems with utf8 posts?):
>
> I actually got both your session logs ok...?

In my first two posts, all the linebreaks were missing. But if you say it was 
okay for you then I won't worry any longer!

Regards,
-- 
Martin Schafföner

Cognitive Systems Group, Institute of Electronics, Signal Processing and 
Communication Technologies, Department of Electrical Engineering, 
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063


openmpi1.log.gz
Description: GNU Zip compressed data


Re: [OMPI users] OpenMPI C++ examples of user defined MPI types(inherited classes)?

2006-06-16 Thread Jeff Squyres (jsquyres)
It's on the to-do list to include some simple MPI examples in Open MPI.

Here's a sample C++ MPI program from the LAM MPI test suite.  It creates
a bunch of different derived datatypes with the C++ MPI API:

https://sourcehaven.osl.iu.edu/svn/trillium/trunk/lamtests/dtyp/datatype
_cxx.cc

Although it doesn't use these to send/receive, you can just use these
datatypes as the datatype argument in the various Send and Recv
functions with corresponding buffers that match the datatype.


> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Jose Quiroga
> Sent: Tuesday, June 13, 2006 8:54 PM
> To: us...@open-mpi.org
> Subject: [OMPI users] OpenMPI C++ examples of user defined 
> MPI types(inherited classes)?
> 
> 
> Hi everybody,
> 
> Can anyone point me to some little C++ examples from
> which to get the main idea of sending/receiving
> messages containing user defined MPI types (MPI
> inherited classes?)?
> 
> Thanks a lot.
> 
> JLQ.
> MPI and OpenMPI newbe.
> 
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Re: [OMPI users] Openmpi 1.0.3svn10374 not launching apps throughTMinterface

2006-06-16 Thread Jeff Squyres (jsquyres)
> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Martin Schafföner
> Sent: Friday, June 16, 2006 9:50 AM
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Openmpi 1.0.3svn10374 not launching 
> apps throughTMinterface
> 
> > (inside a PBS job)
> > pbsdsh - -v hostname

As you indicated, this seems to work for you.  Good.

> > (inside a PBS job)
> > mpirun -np  -d --mca pls_tm_debug 1 hostname

> So, #1 works (I know because we're constantly using pbsdsh 
> and OSC's mpiexec 
> for mpich-type implementations). #2 doesn't work; I'll repeat 
> the session log 
> from my first post, this time (hopefully!!!) with linebreaks 

Can you repeat this and run a non-MPI executable such as "hostname"?  I want to 
take MPI out of the equation and just test the launching system.

Can you also verify that you are correctly matching your OMPI installations on 
all nodes?  I.e., that you've got Open MPI installed in the same location on 
all nodes in your cluster, and your PATH and LD_LIBRARY_PATH are pointing to 
the version of Open MPI that you intend to use.

> (could it be 
> that the mailing list has problems with utf8 posts?):

I actually got both your session logs ok...?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] Openmpi 1.0.3svn10374 not launching apps throughTM interface

2006-06-16 Thread Martin Schafföner
On Friday 16 June 2006 15:00, Jeff Squyres (jsquyres) wrote:
> Try two things:
>
> 1. Use the pbsdsh command to try to launch a trivial non-MPI application
> (e.g., hostname):
>
> (inside a PBS job)
> pbsdsh - -v hostname
>
> where  is the number of vcpu's in your job.
>
> 2. If that works, try mpirun'ing a trivial non-MPI application (e.g.,
> hostname):
>
> (inside a PBS job)
> mpirun -np  -d --mca pls_tm_debug 1 hostname
>
> If #1 fails, then there is something wrong with your Torque installation
> (pbsdsh uses the same PBS API that Open MPI does), and Open MPI's failure
> is a symptom of the underlying problem.  If #1 succeeds and #2 fails, send
> back the details and let's go from there.

So, #1 works (I know because we're constantly using pbsdsh and OSC's mpiexec 
for mpich-type implementations). #2 doesn't work; I'll repeat the session log 
from my first post, this time (hopefully!!!) with linebreaks (could it be 
that the mailing list has problems with utf8 posts?):

schaffoe@node16:~/tmp/mpitest> mpiexec -np 1 --mca pls_tm_debug 1 --mca pls tm 
`pwd`/openmpitest
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm:     orted --no-daemonize --bootproxy 1 --name  
--num_procs 2 --vpid_start 0 --nodename  --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: found /opt/openmpi/bin/orted
[node16:03113] pls:tm: not oversubscribed -- setting mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy 1 --name 
0.0.1 --num_procs 2 --vpid_start 0 --nodename node16 --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm:     orted --no-daemonize --bootproxy 1 --name  
--num_procs 3 --vpid_start 0 --nodename  --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: not oversubscribed -- setting mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy 1 --name 
0.0.2 --num_procs 3 --vpid_start 0 --nodename node16 --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm:     orted --no-daemonize --bootproxy 1 --name  
--num_procs 4 --vpid_start 0 --nodename  --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: not oversubscribed -- setting mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy 1 --name 
0.0.3 --num_procs 4 --vpid_start 0 --nodename node16 --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
mpiexec: killing job...
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm:     orted --no-daemonize --bootproxy 1 --name  
--num_procs 5 --vpid_start 0 --nodename  --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: not oversubscribed -- setting mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy 1 --name 
0.0.4 --num_procs 5 --vpid_start 0 --nodename node16 --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm:     orted --no-daemonize --bootproxy 1 --name  
--num_procs 6 --vpid_start 0 --nodename  --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: not oversubscribed -- setting mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy 1 --name 
0.0.5 --num_procs 6 --vpid_start 0 --nodename node16 --universe 
schaffoe@node16:default-universe-3113 --nsreplica 
"0.0.0;tcp://192.168.1.16:60601" --gprreplica 
"0.0.0;tcp://192.168.1.16:60601"
--
WARNING: mpiexec encountered an abnormal exit.

This means that mpiexec exited before it received notification that all
started processes had terminated.  You should double check and ensure
that there are no runaway processes still executing.

Re: [OMPI users] Openmpi 1.0.3svn10374 not launching apps throughTM interface

2006-06-16 Thread Jeff Squyres (jsquyres)
Try two things:

1. Use the pbsdsh command to try to launch a trivial non-MPI application (e.g., 
hostname):

(inside a PBS job)
pbsdsh - -v hostname

where  is the number of vcpu's in your job.

2. If that works, try mpirun'ing a trivial non-MPI application (e.g., hostname):

(inside a PBS job)
mpirun -np  -d --mca pls_tm_debug 1 hostname  

If #1 fails, then there is something wrong with your Torque installation 
(pbsdsh uses the same PBS API that Open MPI does), and Open MPI's failure is a 
symptom of the underlying problem.  If #1 succeeds and #2 fails, send back the 
details and let's go from there.

Thanks!


> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Martin Schafföner
> Sent: Friday, June 16, 2006 3:27 AM
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Openmpi 1.0.3svn10374 not launching 
> apps throughTM interface
> 
> On Thursday 15 June 2006 16:08, Brock Palen wrote:
> > Jezz i really cant read this morning,  you are using torque and the
> > mpiexec is the one with openmpi.   I cant help you then someone else
> > is going to have to.  Sorry
> 
> Would it be much of a hassle to run a very simple mpi job 
> (may even in an 
> interactive PBS session?)  with mpiexec arguments -d --mca 
> pls_tm_debug 1 ? 
> Could you then post the output so that I (and maybe others) 
> have a reference?
> 
> Regards,
> -- 
> Martin Schafföner
> 
> Cognitive Systems Group, Institute of Electronics, Signal 
> Processing and 
> Communication Technologies, Department of Electrical Engineering, 
> Otto-von-Guericke University Magdeburg
> Phone: +49 391 6720063
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Re: [OMPI users] Openmpi 1.0.3svn10374 not launching apps through TM interface

2006-06-16 Thread Martin Schafföner
On Thursday 15 June 2006 16:08, Brock Palen wrote:
> Jezz i really cant read this morning,  you are using torque and the
> mpiexec is the one with openmpi.   I cant help you then someone else
> is going to have to.  Sorry

Would it be much of a hassle to run a very simple mpi job (may even in an 
interactive PBS session?)  with mpiexec arguments -d --mca pls_tm_debug 1 ? 
Could you then post the output so that I (and maybe others) have a reference?

Regards,
-- 
Martin Schafföner

Cognitive Systems Group, Institute of Electronics, Signal Processing and 
Communication Technologies, Department of Electrical Engineering, 
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063