We don't have a strong desire to fix this in 1.2.7 -- especially since
you're the first person ever to run across this issue. :-)
Looks like this is easy enough to put into v1.3, though.
On Jun 23, 2008, at 9:52 AM, Todd Gamblin wrote:
Thanks for pointing this out (I'm not sure how I got that wrong in
the test) -- making the test program do the right thing gives:
(merle):test$ mpirun -np 4 test
before MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin/test
before MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin/test
etc...
-Todd
On Jun 23, 2008, at 5:03 AM, Jeff Squyres wrote:
I think the issue here is that your test app is checking $PWD, not
getcwd().
If you call getcwd(), you'll get the right answer (see my tests
below). But your point is noted that perhaps OMPI should be
setting PWD to the correct value before launching the user app.
[5:01] svbu-mpi:~/tmp % salloc -N 1 tcsh
salloc: Granted job allocation 5311
[5:01] svbu-mpi:~/tmp % mpirun -np 1 pwd
/home/jsquyres/tmp
[5:01] svbu-mpi:~/tmp % mpirun -np 1 -wdir ~/mpi pwd
/home/jsquyres/mpi
[5:01] svbu-mpi:~/tmp % cat foo.c
#include <stdio.h>
#include <unistd.h>
int main() {
char buf[BUFSIZ];
getcwd(buf, BUFSIZ);
printf("CWD is %s\n", buf);
return 0;
}
[5:01] svbu-mpi:~/tmp % gcc foo.c -o foo
[5:01] svbu-mpi:~/tmp % mpirun -np 1 foo
CWD is /home/jsquyres/tmp
[5:01] svbu-mpi:~/tmp % mpirun -np 1 -wdir ~/mpi ~/tmp/foo
CWD is /home/jsquyres/mpi
[5:01] svbu-mpi:~/tmp %
On Jun 22, 2008, at 12:14 AM, Todd Gamblin wrote:
I'm having trouble getting OpenMPI to set the working directory
properly when running jobs on a Linux cluster. I made a test
program (at end of post) that recreates the problem pretty well by
just printing out the results of getcwd(). Here's output both
with and without using -wdir:
(merle):~$ cd test
(merle):test$ mpirun -np 2 test
before MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin
before MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin
after MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin
after MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin
(merle):test$ mpirun -np 2 -wdir /home/tgamblin/test test
before MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin
before MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin
after MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin
after MPI_Init:
PWD: /home/tgamblin
getcwd: /home/tgamblin
Shouldn't these print out /home/tgamblin/test? Also, this is even
stranger:
(merle):test$ mpirun -np 2 pwd
/home/tgamblin/test
/home/tgamblin/test
I feel like my program should output the same thing as pwd.
I'm using OpenMPI 1.2.6, and the cluster has 8 nodes, with 2-by
dual-core woodcrests each (total 32 cores). There are 2 tcp
networks on this cluster, one that the head node uses to talk to
the compute nodes and one (Gigabit) network that the compute nodes
can reach each other (but not the head node) on. I have
"btl_tcp_if_include = eth2" in my mca params file to keep the
compute nodes using the fast interconnect to talk to each other,
and I've pasted ifconfig output for the head node and for one
compute node below. Also, if it helps, the home directories on
this machine are mounted via autofs.
This is causing problems b/c I'm using apps that look for the
config file in the working directory. Please let me know if you
guys have any idea what's going on.
Thanks!
-Todd
TEST PROGRAM:
#include "mpi.h"
#include <cstdlib>
#include <iostream>
#include <sstream>
using namespace std;
void testdir(const char*where) {
char buf[1024];
getcwd(buf, 1024);
ostringstream tmp;
tmp << where << ":" << endl
<< "\tPWD:\t"<< getenv("PWD") << endl
<< "\tgetcwd:\t"<< getenv("PWD") << endl;
cout << tmp.str();
}
int main(int argc, char **argv) {
testdir("before MPI_Init");
MPI_Init(&argc, &argv);
testdir("after MPI_Init");
MPI_Finalize();
}
HEAD NODE IFCONFIG:
eth0 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:90
inet addr:10.6.1.1 Bcast:10.6.1.255 Mask:255.255.255.0
inet6 addr: fe80::218:8bff:fe2f:3d90/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1579250319 errors:0 dropped:0 overruns:0
frame:0
TX packets:874273636 errors:0 dropped:0 overruns:0
carrier:0
collisions:0 txqueuelen:1000
RX bytes:2361367146846 (2.1 TiB) TX bytes:85373933521
(79.5 GiB)
Interrupt:169 Memory:f4000000-f4011100
eth0:1 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:90
inet addr:10.6.2.1 Bcast:10.6.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Memory:f4000000-f4011100
eth1 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:8E
inet addr:152.54.1.21 Bcast:152.54.3.255 Mask:
255.255.252.0
inet6 addr: fe80::218:8bff:fe2f:3d8e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:14436523 errors:0 dropped:0 overruns:0 frame:0
TX packets:7357596 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2354451258 (2.1 GiB) TX bytes:2218390772 (2.0
GiB)
Interrupt:169 Memory:f8000000-f8011100
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:540889623 errors:0 dropped:0 overruns:0 frame:0
TX packets:540889623 errors:0 dropped:0 overruns:0
carrier:0
collisions:0 txqueuelen:0
RX bytes:63787539844 (59.4 GiB) TX bytes:63787539844
(59.4 GiB)
COMPUTE NODE IFCONFIG:
(compute-0-0):~$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:13:72:FA:42:ED
inet addr:10.6.1.254 Bcast:10.6.1.255 Mask:255.255.255.0
inet6 addr: fe80::213:72ff:fefa:42ed/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:200637 errors:0 dropped:0 overruns:0 frame:0
TX packets:165336 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:187105568 (178.4 MiB) TX bytes:26263945 (25.0
MiB)
Interrupt:169 Memory:f8000000-f8011100
eth2 Link encap:Ethernet HWaddr 00:15:17:0E:9E:68
inet addr:10.6.2.254 Bcast:10.6.2.255 Mask:255.255.255.0
inet6 addr: fe80::215:17ff:fe0e:9e68/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:20 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1280 (1.2 KiB) TX bytes:590 (590.0 b)
Base address:0xdce0 Memory:fc3e0000-fc400000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:65 errors:0 dropped:0 overruns:0 frame:0
TX packets:65 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4376 (4.2 KiB) TX bytes:4376 (4.2 KiB)
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems