Gilles,

Ahh, I didn't know the current status. Thank you for the notice!

Thanks,
Takahiro Kawashima

> Kawashima-san,
> 
> i'd rather consider this as a bug in the README (!)
> 
> 
> heterogenous support has been broken for some time, but it was
> eventually fixed.
> 
> truth is there are *very* limited resources (both human and hardware)
> maintaining heterogeneous
> support, but that does not mean heterogeneous support should not be
> used, nor that bug report
> will be ignored.
> 
> Cheers,
> 
> Gilles
> 
> On 2014/12/24 9:26, Kawashima, Takahiro wrote:
> > Hi Siegmar,
> >
> > Heterogeneous environment is not supported officially.
> >
> > README of Open MPI master says:
> >
> > --enable-heterogeneous
> >   Enable support for running on heterogeneous clusters (e.g., machines
> >   with different endian representations).  Heterogeneous support is
> >   disabled by default because it imposes a minor performance penalty.
> >
> >   *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE ***
> >
> >> Hi,
> >>
> >> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 
> >> Sparc,
> >> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
> >> new Solaris Studio 12.4 compilers. All build processes finished without
> >> errors, but I have a problem running a very small program. It works for
> >> three processes but hangs for six processes. I have the same behaviour
> >> for both compilers.
> >>
> >> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
> >> init_finalize; time
> >> 827.161u 210.126s 30:51.08 56.0%        0+0k 4151+20io 2898pf+0w
> >> Hello!
> >> Hello!
> >> Hello!
> >> 827.886u 210.335s 30:54.68 55.9%        0+0k 4151+20io 2898pf+0w
> >> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
> >> init_finalize; time
> >> 827.946u 210.370s 31:15.02 55.3%        0+0k 4151+20io 2898pf+0w
> >> ^CKilled by signal 2.
> >> Killed by signal 2.
> >> 869.242u 221.644s 33:40.54 53.9%        0+0k 4151+20io 2898pf+0w
> >> tyr small_prog 141 
> >>
> >> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
> >> compiler:"
> >>   Open MPI repo revision: dev-602-g82c02b4
> >>               C compiler: cc
> >> tyr small_prog 146 
> >>
> >>
> >> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
> >> GNU gdb (GDB) 7.6.1
> >> ...
> >> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
> >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
> >> sunpc1,linpc1,tyr 
> >> init_finalize
> >> [Thread debugging using libthread_db enabled]
> >> [New Thread 1 (LWP 1)]
> >> [New LWP    2        ]
> >> Hello!
> >> Hello!
> >> Hello!
> >> [LWP    2         exited]
> >> [New Thread 2        ]
> >> [Switching to Thread 1 (LWP 1)]
> >> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> >> satisfy query
> >> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
> >> The program being debugged has been started already.
> >> Start it from the beginning? (y or n) y
> >>
> >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
> >> sunpc1,linpc1,tyr 
> >> init_finalize
> >> [Thread debugging using libthread_db enabled]
> >> [New Thread 1 (LWP 1)]
> >> [New LWP    2        ]
> >> ^CKilled by signal 2.
> >> Killed by signal 2.
> >>
> >> Program received signal SIGINT, Interrupt.
> >> [Switching to Thread 1 (LWP 1)]
> >> 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> >> (gdb) bt
> >> #0  0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> >> #1  0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
> >> #2  0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1
> >> #3  0xffffffff7e69a630 in poll_dispatch ()
> >>    from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> >> #4  0xffffffff7e6894ec in opal_libevent2021_event_base_loop ()
> >>    from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> >> #5  0x000000010000eb14 in orterun (argc=1757447168, 
> >> argv=0xffffff7ed8550cff)
> >>     at 
> >> ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
> >> #6  0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00)
> >>     at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
> >> (gdb) 
> >>
> >> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
> >> any patches until the end of the year. Neverthess I wanted to report the
> >> problem. At the moment I cannot test if I have the same behaviour in a
> >> homogeneous environment with three machines because the new version isn't
> >> available before tomorrow on the other machines. I used the following
> >> configure command.
> >>
> >> ../openmpi-dev-602-g82c02b4/configure 
> >> --prefix=/usr/local/openmpi-1.9.0_64_cc \
> >>   --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
> >>   --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
> >>   --with-jdk-headers=/usr/local/jdk1.8.0/include \
> >>   JAVA_HOME=/usr/local/jdk1.8.0 \
> >>   LDFLAGS="-m64 -mt" \
> >>   CC="cc" CXX="CC" FC="f95" \
> >>   CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
> >>   CPP="cpp" CXXCPP="cpp" \
> >>   CPPFLAGS="" CXXCPPFLAGS="" \
> >>   --enable-mpi-cxx \
> >>   --enable-cxx-exceptions \
> >>   --enable-mpi-java \
> >>   --enable-heterogeneous \
> >>   --enable-mpi-thread-multiple \
> >>   --with-threads=posix \
> >>   --with-hwloc=internal \
> >>   --without-verbs \
> >>   --with-wrapper-cflags="-m64 -mt" \
> >>   --with-wrapper-cxxflags="-m64 -library=stlport4" \
> >>   --with-wrapper-ldflags="-mt" \
> >>   --enable-debug \
> >>   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> >>
> >> Furthermore I used the following test program.
> >>
> >> #include <stdio.h>
> >> #include <stdlib.h>
> >> #include "mpi.h"
> >>
> >> int main (int argc, char *argv[])
> >> {
> >>   MPI_Init (&argc, &argv);
> >>   printf ("Hello!\n");
> >>   MPI_Finalize ();
> >>   return EXIT_SUCCESS;
> >> }

Reply via email to