Hello all,

        I've been trying to set up a small test cluster with a dual Opteron 
head and Athlon nodes. My environment in both cases is Gentoo and the nodes 
boot off PXE using an image built and stored on the master node. I chroot into 
the node's environment using:

linux32 chroot ${ROOT} /bin/bash

To cross over the 64/32bit barrier. My user's home direcory is loop-mounted 
into that environment and NFS exported to the nodes. I build OpenMPI in the 
following way:

In the build folder of OpenMPI-1.1:
./configure --cache-file=config_`uname -m`.cache 
--enable-pretty-print-stacktrace --prefix=$HOME/openmpi_`uname -m`
make -j4 && make install

I perform this exact same command in the Opteron and chrooted environment for 
the Athlon machines. This then gives me the following folders in my $HOME:
/home/kyron/openmpi_i686
/home/kyron/openmpi_x86_64

But, for some reason, on the Athlon node (in their image on the server I should 
say) OpenMPI still doesn't seem to be built correctly since it crashes as 
follows:

kyron@node0 ~ $ mpirun -np 1 uptime
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:(nil)
[0] func:/home/kyron/openmpi_i686/lib/libopal.so.0 [0xb7f6258f]
[1] func:[0xffffe440]
[2] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init_stage1+0x1d7) 
[0xb7fa0227]
[3] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_system_init+0x23) 
[0xb7fa3683]
[4] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init+0x5f) [0xb7f9ff7f]
[5] func:mpirun(orterun+0x255) [0x804a015]
[6] func:mpirun(main+0x22) [0x8049db6]
[7] func:/lib/tls/libc.so.6(__libc_start_main+0xdb) [0xb7de8f0b]
[8] func:mpirun [0x8049d11]
*** End of error message ***
Segmentation fault

The crash happens both in the chrooted env and on the  nodes. I configured both 
systems to have Linux and POSIX threads, though I see openmpi is calling the 
POSIX version (a message on the mailling list had hinted on keeping the Linux 
threads around...I have to anyways since sone apps like Matlab extensions still 
depend on this...). The following is the output for the libc info.

kyron@headless ~ $ /lib/tls/libc.so.6
GNU C Library stable release version 2.3.6, by Roland McGrath et al.
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.1.1 (Gentoo 4.1.1).
Compiled on a Linux 2.6.11 system on 2006-07-14.
Available extensions:
        GNU libio by Per Bothner
        crypt add-on version 2.1 by Michael Glad and others
        Native POSIX Threads Library by Ulrich Drepper et al
        The C stubs add-on version 2.1.2.
        GNU Libidn by Simon Josefsson
        BIND-8.2.3-T5B
        NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Thread-local storage support included.
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

I am attaching the config.log and ompi_info for both platforms. Before sending 
this e-mail I tried compiling OpenMPI on one of the nodes (booted off the 
image) and I am getting the exact same problem (so chroot vs local build 
doesn't seem to be a factor). The attached file contains:

config.log.x86_64       <--config log for the Opteron build (works locally)
config.log_node0        <--config log for the Athlon build (on the node)
ompi_info.i686  <--ompi_info on the Athlon node
ompi_info.x86_64        <--ompi_info on the Opteron Master

Thanks,

-- 
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517

Attachment: ENV_info.tbz
Description: application/tbz

Reply via email to