Gilles,
Ahh, I didn't know the current status. Thank you for the notice!
Thanks,
Takahiro Kawashima
> Kawashima-san,
>
> i'd rather consider this as a bug in the README (!)
>
>
> heterogenous support has been broken for some time, but it was
> eventually fixed.
>
> truth is there are *very* limited resources (both human and hardware)
> maintaining heterogeneous
> support, but that does not mean heterogeneous support should not be
> used, nor that bug report
> will be ignored.
>
> Cheers,
>
> Gilles
>
> On 2014/12/24 9:26, Kawashima, Takahiro wrote:
> > Hi Siegmar,
> >
> > Heterogeneous environment is not supported officially.
> >
> > README of Open MPI master says:
> >
> > --enable-heterogeneous
> > Enable support for running on heterogeneous clusters (e.g., machines
> > with different endian representations). Heterogeneous support is
> > disabled by default because it imposes a minor performance penalty.
> >
> > *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE ***
> >
> >> Hi,
> >>
> >> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10
> >> Sparc,
> >> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
> >> new Solaris Studio 12.4 compilers. All build processes finished without
> >> errors, but I have a problem running a very small program. It works for
> >> three processes but hangs for six processes. I have the same behaviour
> >> for both compilers.
> >>
> >> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr
> >> init_finalize; time
> >> 827.161u 210.126s 30:51.08 56.0% 0+0k 4151+20io 2898pf+0w
> >> Hello!
> >> Hello!
> >> Hello!
> >> 827.886u 210.335s 30:54.68 55.9% 0+0k 4151+20io 2898pf+0w
> >> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr
> >> init_finalize; time
> >> 827.946u 210.370s 31:15.02 55.3% 0+0k 4151+20io 2898pf+0w
> >> ^CKilled by signal 2.
> >> Killed by signal 2.
> >> 869.242u 221.644s 33:40.54 53.9% 0+0k 4151+20io 2898pf+0w
> >> tyr small_prog 141
> >>
> >> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C
> >> compiler:"
> >> Open MPI repo revision: dev-602-g82c02b4
> >> C compiler: cc
> >> tyr small_prog 146
> >>
> >>
> >> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
> >> GNU gdb (GDB) 7.6.1
> >> ...
> >> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
> >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host
> >> sunpc1,linpc1,tyr
> >> init_finalize
> >> [Thread debugging using libthread_db enabled]
> >> [New Thread 1 (LWP 1)]
> >> [New LWP 2 ]
> >> Hello!
> >> Hello!
> >> Hello!
> >> [LWP 2 exited]
> >> [New Thread 2 ]
> >> [Switching to Thread 1 (LWP 1)]
> >> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to
> >> satisfy query
> >> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
> >> The program being debugged has been started already.
> >> Start it from the beginning? (y or n) y
> >>
> >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host
> >> sunpc1,linpc1,tyr
> >> init_finalize
> >> [Thread debugging using libthread_db enabled]
> >> [New Thread 1 (LWP 1)]
> >> [New LWP 2 ]
> >> ^CKilled by signal 2.
> >> Killed by signal 2.
> >>
> >> Program received signal SIGINT, Interrupt.
> >> [Switching to Thread 1 (LWP 1)]
> >> 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> >> (gdb) bt
> >> #0 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> >> #1 0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
> >> #2 0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1
> >> #3 0xffffffff7e69a630 in poll_dispatch ()
> >> from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> >> #4 0xffffffff7e6894ec in opal_libevent2021_event_base_loop ()
> >> from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> >> #5 0x000000010000eb14 in orterun (argc=1757447168,
> >> argv=0xffffff7ed8550cff)
> >> at
> >> ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
> >> #6 0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00)
> >> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
> >> (gdb)
> >>
> >> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
> >> any patches until the end of the year. Neverthess I wanted to report the
> >> problem. At the moment I cannot test if I have the same behaviour in a
> >> homogeneous environment with three machines because the new version isn't
> >> available before tomorrow on the other machines. I used the following
> >> configure command.
> >>
> >> ../openmpi-dev-602-g82c02b4/configure
> >> --prefix=/usr/local/openmpi-1.9.0_64_cc \
> >> --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
> >> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
> >> --with-jdk-headers=/usr/local/jdk1.8.0/include \
> >> JAVA_HOME=/usr/local/jdk1.8.0 \
> >> LDFLAGS="-m64 -mt" \
> >> CC="cc" CXX="CC" FC="f95" \
> >> CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
> >> CPP="cpp" CXXCPP="cpp" \
> >> CPPFLAGS="" CXXCPPFLAGS="" \
> >> --enable-mpi-cxx \
> >> --enable-cxx-exceptions \
> >> --enable-mpi-java \
> >> --enable-heterogeneous \
> >> --enable-mpi-thread-multiple \
> >> --with-threads=posix \
> >> --with-hwloc=internal \
> >> --without-verbs \
> >> --with-wrapper-cflags="-m64 -mt" \
> >> --with-wrapper-cxxflags="-m64 -library=stlport4" \
> >> --with-wrapper-ldflags="-mt" \
> >> --enable-debug \
> >> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> >>
> >> Furthermore I used the following test program.
> >>
> >> #include <stdio.h>
> >> #include <stdlib.h>
> >> #include "mpi.h"
> >>
> >> int main (int argc, char *argv[])
> >> {
> >> MPI_Init (&argc, &argv);
> >> printf ("Hello!\n");
> >> MPI_Finalize ();
> >> return EXIT_SUCCESS;
> >> }