Hi Siegmar, Heterogeneous environment is not supported officially.
README of Open MPI master says: --enable-heterogeneous Enable support for running on heterogeneous clusters (e.g., machines with different endian representations). Heterogeneous support is disabled by default because it imposes a minor performance penalty. *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE *** > Hi, > > today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc, > Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the > new Solaris Studio 12.4 compilers. All build processes finished without > errors, but I have a problem running a very small program. It works for > three processes but hangs for six processes. I have the same behaviour > for both compilers. > > tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr > init_finalize; time > 827.161u 210.126s 30:51.08 56.0% 0+0k 4151+20io 2898pf+0w > Hello! > Hello! > Hello! > 827.886u 210.335s 30:54.68 55.9% 0+0k 4151+20io 2898pf+0w > tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr > init_finalize; time > 827.946u 210.370s 31:15.02 55.3% 0+0k 4151+20io 2898pf+0w > ^CKilled by signal 2. > Killed by signal 2. > 869.242u 221.644s 33:40.54 53.9% 0+0k 4151+20io 2898pf+0w > tyr small_prog 141 > > tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C > compiler:" > Open MPI repo revision: dev-602-g82c02b4 > C compiler: cc > tyr small_prog 146 > > > tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec > GNU gdb (GDB) 7.6.1 > ... > (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize > Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host > sunpc1,linpc1,tyr > init_finalize > [Thread debugging using libthread_db enabled] > [New Thread 1 (LWP 1)] > [New LWP 2 ] > Hello! > Hello! > Hello! > [LWP 2 exited] > [New Thread 2 ] > [Switching to Thread 1 (LWP 1)] > sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to > satisfy query > (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize > The program being debugged has been started already. > Start it from the beginning? (y or n) y > > Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host > sunpc1,linpc1,tyr > init_finalize > [Thread debugging using libthread_db enabled] > [New Thread 1 (LWP 1)] > [New LWP 2 ] > ^CKilled by signal 2. > Killed by signal 2. > > Program received signal SIGINT, Interrupt. > [Switching to Thread 1 (LWP 1)] > 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 > (gdb) bt > #0 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 > #1 0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1 > #2 0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1 > #3 0xffffffff7e69a630 in poll_dispatch () > from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 > #4 0xffffffff7e6894ec in opal_libevent2021_event_base_loop () > from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 > #5 0x000000010000eb14 in orterun (argc=1757447168, argv=0xffffff7ed8550cff) > at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090 > #6 0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00) > at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13 > (gdb) > > Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test > any patches until the end of the year. Neverthess I wanted to report the > problem. At the moment I cannot test if I have the same behaviour in a > homogeneous environment with three machines because the new version isn't > available before tomorrow on the other machines. I used the following > configure command. > > ../openmpi-dev-602-g82c02b4/configure --prefix=/usr/local/openmpi-1.9.0_64_cc > \ > --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \ > --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ > --with-jdk-headers=/usr/local/jdk1.8.0/include \ > JAVA_HOME=/usr/local/jdk1.8.0 \ > LDFLAGS="-m64 -mt" \ > CC="cc" CXX="CC" FC="f95" \ > CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ > CPP="cpp" CXXCPP="cpp" \ > CPPFLAGS="" CXXCPPFLAGS="" \ > --enable-mpi-cxx \ > --enable-cxx-exceptions \ > --enable-mpi-java \ > --enable-heterogeneous \ > --enable-mpi-thread-multiple \ > --with-threads=posix \ > --with-hwloc=internal \ > --without-verbs \ > --with-wrapper-cflags="-m64 -mt" \ > --with-wrapper-cxxflags="-m64 -library=stlport4" \ > --with-wrapper-ldflags="-mt" \ > --enable-debug \ > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc > > Furthermore I used the following test program. > > #include <stdio.h> > #include <stdlib.h> > #include "mpi.h" > > int main (int argc, char *argv[]) > { > MPI_Init (&argc, &argv); > printf ("Hello!\n"); > MPI_Finalize (); > return EXIT_SUCCESS; > }