Hi, Don,
Thanks again for the detail report. Guess we are a bit puzzled here.
Just a couple of questions to check with you on these systems?
a) Did you have to build the mvapich libraries separately on these
machines? Related to this, are the directories: /usr/local,
/tmp/mvapich, nfs exported for sharing across different machines? If
you had to build the libraries separately, please provide the other set
of info from `jatoba'.
b) Could you provide some additional specifications of these two
machines? Kernel versions, linux distribution, HCA firmware and
versions, gen2 kernel and userspace versions? We could have asked you
earlier. But just came to think these might be relevant...
Looking ahead. We may need access into your systems to give it a shot,
if it is ever possible...
Weikuan
On Mar 22, 2006, at 5:06 PM, [EMAIL PROTECTED] wrote:
Weikuan,
>
> However, let us look into this slightly differently. Does the
problem
> happen to 2 processes on the same node too? If so, we can focus on
such
No, when I run on only one machine (either one) I can execute multiple
copies of the jobs:
[EMAIL PROTECTED] cpi]# mpirun_mpd -np 2 ./cpi
pi is approximately 3.1416009869231241, Error is 0.0000083333333309
wall clock time = 0.000088
Process 0 on koa.az05.bull.com
Process 1 on koa.az05.bull.com
[EMAIL PROTECTED] cpi]# mpirun_mpd -np 4 ./cpi
Process 0 on koa.az05.bull.com
Process 1 on koa.az05.bull.com
Process 2 on koa.az05.bull.com
Process 3 on koa.az05.bull.com
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.000165
> a case and get it cleared up first. Let us say we use `koa'. Could
you
> provide the following information to us? We can do that for
> mvapich-gen2 first, assuming the cause of the problem will be
similar
> for the other case.
>
> a) echo $LD_LIBRARY_PATH
$LD_LIBRARY_PATH is null
> b) The modified build script. The template you used for
mvapich-gen2
> should be make.mvapich.gen2.
> c) build/configure log file: config.log, make.log, etc.
These files are attached below.
> d) The output of this command:
> # mpicc -show -o cpi example/basic/cpi.c
[EMAIL PROTECTED] mvapich-gen2]# mpicc -show -o cpi examples/basic/cpi.c
gcc -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
-DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -c
examples/basic/cpi.c -I/usr/local/mvapich/include
gcc -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
-DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1
-L/usr/local/mvapich/lib cpi.o -o cpi -lmpich -L/usr/local/lib
-Wl,-rpath=/usr/local/lib -libverbs -lpthread
Also here is the output of "ldd" on the executable file:
[EMAIL PROTECTED] mvapich-gen2]# ldd /home/ib/tests/mpi/cpi/cpi
libibverbs.so.1 => /usr/local/lib/libibverbs.so.1
(0x00002aaaaaaac000)
libpthread.so.0 => /lib64/tls/libpthread.so.0
(0x000000352cb00000)
libc.so.6 => /lib64/tls/libc.so.6 (0x000000352bc00000)
libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x00000038b1900000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000352bf00000)
/lib64/ld-linux-x86-64.so.2 (0x000000352ba00000)
-Don Albert-
<make.mvapich.gen2><config.status><config-mine.log><install-
mine.log><config.log><make-mine.log>
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general