Hello all, I've been trying to set up a small test cluster with a dual Opteron head and Athlon nodes. My environment in both cases is Gentoo and the nodes boot off PXE using an image built and stored on the master node. I chroot into the node's environment using:
linux32 chroot ${ROOT} /bin/bash To cross over the 64/32bit barrier. My user's home direcory is loop-mounted into that environment and NFS exported to the nodes. I build OpenMPI in the following way: In the build folder of OpenMPI-1.1: ./configure --cache-file=config_`uname -m`.cache --enable-pretty-print-stacktrace --prefix=$HOME/openmpi_`uname -m` make -j4 && make install I perform this exact same command in the Opteron and chrooted environment for the Athlon machines. This then gives me the following folders in my $HOME: /home/kyron/openmpi_i686 /home/kyron/openmpi_x86_64 But, for some reason, on the Athlon node (in their image on the server I should say) OpenMPI still doesn't seem to be built correctly since it crashes as follows: kyron@node0 ~ $ mpirun -np 1 uptime Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:(nil) [0] func:/home/kyron/openmpi_i686/lib/libopal.so.0 [0xb7f6258f] [1] func:[0xffffe440] [2] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init_stage1+0x1d7) [0xb7fa0227] [3] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_system_init+0x23) [0xb7fa3683] [4] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init+0x5f) [0xb7f9ff7f] [5] func:mpirun(orterun+0x255) [0x804a015] [6] func:mpirun(main+0x22) [0x8049db6] [7] func:/lib/tls/libc.so.6(__libc_start_main+0xdb) [0xb7de8f0b] [8] func:mpirun [0x8049d11] *** End of error message *** Segmentation fault The crash happens both in the chrooted env and on the nodes. I configured both systems to have Linux and POSIX threads, though I see openmpi is calling the POSIX version (a message on the mailling list had hinted on keeping the Linux threads around...I have to anyways since sone apps like Matlab extensions still depend on this...). The following is the output for the libc info. kyron@headless ~ $ /lib/tls/libc.so.6 GNU C Library stable release version 2.3.6, by Roland McGrath et al. Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.1.1 (Gentoo 4.1.1). Compiled on a Linux 2.6.11 system on 2006-07-14. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others Native POSIX Threads Library by Ulrich Drepper et al The C stubs add-on version 2.1.2. GNU Libidn by Simon Josefsson BIND-8.2.3-T5B NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk Thread-local storage support included. For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>. I am attaching the config.log and ompi_info for both platforms. Before sending this e-mail I tried compiling OpenMPI on one of the nodes (booted off the image) and I am getting the exact same problem (so chroot vs local build doesn't seem to be a factor). The attached file contains: config.log.x86_64 <--config log for the Opteron build (works locally) config.log_node0 <--config log for the Athlon build (on the node) ompi_info.i686 <--ompi_info on the Athlon node ompi_info.x86_64 <--ompi_info on the Opteron Master Thanks, -- Eric Thibodeau Neural Bucket Solutions Inc. T. (514) 736-1436 C. (514) 710-0517
ENV_info.tbz
Description: application/tbz