Hi Siegmar,

Heterogeneous environment is not supported officially.

README of Open MPI master says:

--enable-heterogeneous
  Enable support for running on heterogeneous clusters (e.g., machines
  with different endian representations).  Heterogeneous support is
  disabled by default because it imposes a minor performance penalty.

  *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE ***

> Hi,
> 
> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc,
> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
> new Solaris Studio 12.4 compilers. All build processes finished without
> errors, but I have a problem running a very small program. It works for
> three processes but hangs for six processes. I have the same behaviour
> for both compilers.
> 
> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
> init_finalize; time
> 827.161u 210.126s 30:51.08 56.0%        0+0k 4151+20io 2898pf+0w
> Hello!
> Hello!
> Hello!
> 827.886u 210.335s 30:54.68 55.9%        0+0k 4151+20io 2898pf+0w
> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
> init_finalize; time
> 827.946u 210.370s 31:15.02 55.3%        0+0k 4151+20io 2898pf+0w
> ^CKilled by signal 2.
> Killed by signal 2.
> 869.242u 221.644s 33:40.54 53.9%        0+0k 4151+20io 2898pf+0w
> tyr small_prog 141 
> 
> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
> compiler:"
>   Open MPI repo revision: dev-602-g82c02b4
>               C compiler: cc
> tyr small_prog 146 
> 
> 
> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
> GNU gdb (GDB) 7.6.1
> ...
> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
> sunpc1,linpc1,tyr 
> init_finalize
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> Hello!
> Hello!
> Hello!
> [LWP    2         exited]
> [New Thread 2        ]
> [Switching to Thread 1 (LWP 1)]
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> satisfy query
> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
> The program being debugged has been started already.
> Start it from the beginning? (y or n) y
> 
> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
> sunpc1,linpc1,tyr 
> init_finalize
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> ^CKilled by signal 2.
> Killed by signal 2.
> 
> Program received signal SIGINT, Interrupt.
> [Switching to Thread 1 (LWP 1)]
> 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> (gdb) bt
> #0  0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> #1  0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
> #2  0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1
> #3  0xffffffff7e69a630 in poll_dispatch ()
>    from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> #4  0xffffffff7e6894ec in opal_libevent2021_event_base_loop ()
>    from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> #5  0x000000010000eb14 in orterun (argc=1757447168, argv=0xffffff7ed8550cff)
>     at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
> #6  0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00)
>     at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
> (gdb) 
> 
> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
> any patches until the end of the year. Neverthess I wanted to report the
> problem. At the moment I cannot test if I have the same behaviour in a
> homogeneous environment with three machines because the new version isn't
> available before tomorrow on the other machines. I used the following
> configure command.
> 
> ../openmpi-dev-602-g82c02b4/configure --prefix=/usr/local/openmpi-1.9.0_64_cc 
> \
>   --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
>   --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>   --with-jdk-headers=/usr/local/jdk1.8.0/include \
>   JAVA_HOME=/usr/local/jdk1.8.0 \
>   LDFLAGS="-m64 -mt" \
>   CC="cc" CXX="CC" FC="f95" \
>   CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
>   CPP="cpp" CXXCPP="cpp" \
>   CPPFLAGS="" CXXCPPFLAGS="" \
>   --enable-mpi-cxx \
>   --enable-cxx-exceptions \
>   --enable-mpi-java \
>   --enable-heterogeneous \
>   --enable-mpi-thread-multiple \
>   --with-threads=posix \
>   --with-hwloc=internal \
>   --without-verbs \
>   --with-wrapper-cflags="-m64 -mt" \
>   --with-wrapper-cxxflags="-m64 -library=stlport4" \
>   --with-wrapper-ldflags="-mt" \
>   --enable-debug \
>   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> 
> Furthermore I used the following test program.
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include "mpi.h"
> 
> int main (int argc, char *argv[])
> {
>   MPI_Init (&argc, &argv);
>   printf ("Hello!\n");
>   MPI_Finalize ();
>   return EXIT_SUCCESS;
> }

Reply via email to