On 18 July 2007 at 19:14, Dirk Eddelbuettel wrote:
| 
| Hi Tim,
| 
| Thanks for the follow-up
| 
| On 18 July 2007 at 17:22, Tim Prins wrote:
| | <snip>
| | > Yes, this helps tremendously.  I installed rsh, and now it pretty much
| | > works.
| | Glad this worked out for you.
| | 
| | >
| | > The one missing detail is that I can't seem to get the stdout/stderr
| | > output.  For example:
| | >
| | > $ orterun -np 1 uptime
| | > $ uptime
| | > 18:24:27 up 13 days,  3:03,  0 users,  load average: 0.00, 0.03, 0.00
| | >
| | > The man page indicates that stdout/stderr is supposed to come back to
| | > the stdout/stderr of the orterun process.  Any ideas on why this isn't
| | > working?
| | It should work. However, we currently have some I/O forwarding problems 
which 
| | show up in some environments that will (hopefully) be fixed in the next 
| | release. As far as I know, the problem seems to happen mostly with non-mpi 
| | applications.
| | 
| | Try running a simple mpi application, such as:
| | 
| | #include <stdio.h>
| | #include "mpi.h"
| | 
| | int main(int argc, char* argv[])
| | {
| |     int rank, size;
| | 
| |     MPI_Init(&argc, &argv);
| |     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
| |     MPI_Comm_size(MPI_COMM_WORLD, &size);
| |     printf("Hello, world, I am %d of %d\n", rank, size);
| |     MPI_Finalize();
| | 
| |     return 0;
| | }
| | 
| | If that works fine, then it is probably our problem, and not a problem with 
| | your setup.
| | 
| | Sorry I don't have a better answer :(
| 
| That works (and I use the same Debian openmpi 1.2.3-1 set of packages Adam
| has): 
| 
| edd@basebud:~> opalcc -o /tmp/openmpitest /tmp/openmpitest.c -lmpi
| edd@basebud:~> orterun -np 4 /tmp/openmpitest
| Hello, world, I am 2 of 4
| Hello, world, I am 1 of 4
| Hello, world, I am 0 of 4
| Hello, world, I am 3 of 4
| edd@basebud:~>                
| 
| I was toying with this at work earlier, and it was hanging there (using
| hostname or uptime as the token binaries) as soon as I increased the np
| parameter beyond 1. 
| 
| It works here:
| 
| edd@basebud:~> orterun -np 4 hostname
| basebud
| basebud
| basebud
| basebud
| edd@basebud:~>
| 
| I have slurm-llnl test packages installed at work but not here. Maybe I need
| to a dig a bit more into slurm.  (Adam: slurm package should be forthcoming.
| I can point you to the snapshots from the fellow whom I mentor on this.)

Indeed, at work it hangs once it up the np parameter:

foo:~> orterun -np 4 ./openmpitest
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4
orterun: killing job...

Killed
foo:~> orterun -np 4 -H localhost ./openmpitest
Hello, world, I am 1 of 4
Hello, world, I am 0 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4
foo:~>     

Restricting it to localhost helps.  Any ideas?

x86 multicore/multicpu, Open MPI 1.2.3, Slurm 1.2.11, Ubuntu 7.04 plus a
handful of handcompiled packages from Debian unstable. More details available
just tell what is needed and how best to compile it.

Dirk

-- 
Hell, there are no rules here - we're trying to accomplish something. 
                                                  -- Thomas A. Edison

Reply via email to