Hmmm...well, this is indeed confusing. I see the following in your attached output:
[sunpc4.informatik.hs-fulda.de][[4083,1],2][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] mca_base_modex_recv: failed with return value=-13 [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] mca_base_modex_recv: failed with return value=-13 [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] mca_base_modex_recv: failed with return value=-13 [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] mca_base_modex_recv: failed with return value=-13 This implies that at least some of the processes started and got all the way into MPI_Init. You should probably exclude the sctp BTL as it's not necessarily working - just add -mca btl ^sctp to the cmd line. Does this work if you leave linpc out of it? I'm wondering if this is the heterogeneous problem again. Are you sure that the /usr/local... OMPI library on that machine is the Linux x86_64 version, and not the Solaris one (e.g., if /usr/local was NFS mounted)? On Wed, Sep 26, 2012 at 7:30 AM, Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > > I'm on the road the rest of this week, but can look at this when I return > > next week. It looks like something unrelated to the Java bindings failed > to > > properly initialize - at a guess, I'd suspect that you are missing the > > LD_LIBRARY_PATH setting so none of the OMPI libs were found. > > Perhaps the output of my environment program is helpful in that case. > I attached my environment. > > mpiexec -np 4 -host linpc4,sunpc4,rs0 environ_mpi \ > >& env_linpc_sunpc_sparc.txt > > Thank you very much for your help in advance. > > > Kind regards > > Siegmar > > > > On Wed, Sep 26, 2012 at 5:42 AM, Siegmar Gross < > > siegmar.gr...@informatik.hs-fulda.de> wrote: > > > > > Hi, > > > > > > yesterday I installed openmpi-1.9a1r27362 on Solaris and Linux and > > > I have a problem with mpiJava on Linux (openSUSE-Linux 12.1, x86_64). > > > > > > > > > linpc4 mpi_classfiles 104 javac HelloMainWithoutMPI.java > > > linpc4 mpi_classfiles 105 mpijavac HelloMainWithBarrier.java > > > linpc4 mpi_classfiles 106 mpijavac -showme > > > /usr/local/jdk1.7.0_07-64/bin/javac \ > > > -cp ...:.:/usr/local/openmpi-1.9_64_cc/lib64/mpi.jar > > > > > > > > > It works with Java without MPI. > > > > > > linpc4 mpi_classfiles 107 mpiexec java -cp $HOME/mpi_classfiles \ > > > HelloMainWithoutMPI > > > Hello from linpc4.informatik.hs-fulda.de/193.174.26.225 > > > > > > > > > It breaks with Java and MPI. > > > > > > linpc4 mpi_classfiles 108 mpiexec java -cp $HOME/mpi_classfiles \ > > > HelloMainWithBarrier > > > > -------------------------------------------------------------------------- > > > It looks like opal_init failed for some reason; your parallel process > is > > > likely to abort. There are many reasons that a parallel process can > > > fail during opal_init; some of which are due to configuration or > > > environment problems. This failure appears to be an internal failure; > > > here's some additional information (which may only be relevant to an > > > Open MPI developer): > > > > > > mca_base_open failed > > > --> Returned value -2 instead of OPAL_SUCCESS > > > > -------------------------------------------------------------------------- > > > > -------------------------------------------------------------------------- > > > It looks like orte_init failed for some reason; your parallel process > is > > > likely to abort. There are many reasons that a parallel process can > > > fail during orte_init; some of which are due to configuration or > > > environment problems. This failure appears to be an internal failure; > > > here's some additional information (which may only be relevant to an > > > Open MPI developer): > > > > > > opal_init failed > > > --> Returned value Out of resource (-2) instead of ORTE_SUCCESS > > > > -------------------------------------------------------------------------- > > > > -------------------------------------------------------------------------- > > > It looks like MPI_INIT failed for some reason; your parallel process is > > > likely to abort. There are many reasons that a parallel process can > > > fail during MPI_INIT; some of which are due to configuration or > environment > > > problems. This failure appears to be an internal failure; here's some > > > additional information (which may only be relevant to an Open MPI > > > developer): > > > > > > ompi_mpi_init: orte_init failed > > > --> Returned "Out of resource" (-2) instead of "Success" (0) > > > > -------------------------------------------------------------------------- > > > *** An error occurred in MPI_Init > > > *** on a NULL communicator > > > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now > abort, > > > *** and potentially your MPI job) > > > [linpc4:15332] Local abort before MPI_INIT completed successfully; not > > > able to > > > aggregate error messages, and not able to guarantee that all other > > > processes were > > > killed! > > > ------------------------------------------------------- > > > Primary job terminated normally, but 1 process returned > > > a non-zero exit code.. Per user-direction, the job has been aborted. > > > ------------------------------------------------------- > > > > -------------------------------------------------------------------------- > > > mpiexec detected that one or more processes exited with non-zero > status, > > > thus > > > causing > > > the job to be terminated. The first process to do so was: > > > > > > Process name: [[58875,1],0] > > > Exit code: 1 > > > > -------------------------------------------------------------------------- > > > > > > > > > I configured with the following command. > > > > > > ../openmpi-1.9a1r27362/configure --prefix=/usr/local/openmpi-1.9_64_cc > \ > > > --libdir=/usr/local/openmpi-1.9_64_cc/lib64 \ > > > --with-jdk-bindir=/usr/local/jdk1.7.0_07-64/bin \ > > > --with-jdk-headers=/usr/local/jdk1.7.0_07-64/include \ > > > JAVA_HOME=/usr/local/jdk1.7.0_07-64 \ > > > LDFLAGS="-m64" \ > > > CC="cc" CXX="CC" FC="f95" \ > > > CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ > > > CPP="cpp" CXXCPP="cpp" \ > > > CPPFLAGS="" CXXCPPFLAGS="" \ > > > C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \ > > > OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \ > > > --enable-cxx-exceptions \ > > > --enable-mpi-java \ > > > --enable-heterogeneous \ > > > --enable-opal-multi-threads \ > > > --enable-mpi-thread-multiple \ > > > --with-threads=posix \ > > > --with-hwloc=internal \ > > > --without-verbs \ > > > --without-udapl \ > > > --with-wrapper-cflags=-m64 \ > > > --enable-debug \ > > > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc > > > > > > > > > It works fine on Solaris machines as long as the hosts belong to the > > > same kind (Sparc or x86_64). > > > > > > tyr mpi_classfiles 194 mpiexec -host sunpc0,sunpc1,sunpc4 \ > > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > > Process 1 of 3 running on sunpc1 > > > Process 2 of 3 running on sunpc4.informatik.hs-fulda.de > > > Process 0 of 3 running on sunpc0 > > > > > > sunpc4 fd1026 107 mpiexec -host tyr,rs0,rs1 \ > > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > > Process 1 of 3 running on rs0.informatik.hs-fulda.de > > > Process 2 of 3 running on rs1.informatik.hs-fulda.de > > > Process 0 of 3 running on tyr.informatik.hs-fulda.de > > > > > > > > > It breaks if the hosts belong to both kinds of machines. > > > > > > sunpc4 fd1026 106 mpiexec -host tyr,rs0,sunpc1 \ > > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > > [rs0.informatik.hs-fulda.de:7718] *** An error occurred in > MPI_Comm_dup > > > [rs0.informatik.hs-fulda.de:7718] *** reported by process > [565116929,1] > > > [rs0.informatik.hs-fulda.de:7718] *** on communicator MPI_COMM_WORLD > > > [rs0.informatik.hs-fulda.de:7718] *** MPI_ERR_INTERN: internal error > > > [rs0.informatik.hs-fulda.de:7718] *** MPI_ERRORS_ARE_FATAL (processes > > > in this communicator will now abort, > > > [rs0.informatik.hs-fulda.de:7718] *** and potentially your MPI job) > > > [sunpc4.informatik.hs-fulda.de:07900] 1 more process has sent help > > > message help-mpi-errors.txt / mpi_errors_are_fatal > > > [sunpc4.informatik.hs-fulda.de:07900] Set MCA parameter > > > "orte_base_help_aggregate" to 0 to see all help / error messages > > > > > > > > > Please let me know if I can provide anything else to track these > errors. > > > Thank you very much for any help in advance. > > > > > > > > > Kind regards > > > > > > Siegmar > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > [sunpc4.informatik.hs-fulda.de][[4083,1],2][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] > mca_base_modex_recv: failed with return value=-13 > [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] > mca_base_modex_recv: failed with return value=-13 > [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] > mca_base_modex_recv: failed with return value=-13 > [rs0.informatik.hs-fulda.de][[4083,1],3][../../../../../openmpi-1.9a1r27362/ompi/mca/btl/sctp/btl_sctp_proc.c:143:mca_btl_sctp_proc_create] > mca_base_modex_recv: failed with return value=-13 > > > Now 3 slave tasks are sending their environment. > > Environment from task 1: > message type: 3 > msg length: 3911 characters > message: > hostname: linpc4 > operating system: Linux > release: 3.1.9-1.4-desktop > processor: x86_64 > PATH > /usr/local/eclipse-3.6.1 > /usr/local/NetBeans-4.0/bin > /usr/local/jdk1.7.0_07-64/bin > /usr/local/apache-ant-1.6.2/bin > /usr/local/icc-9.1/idb/bin > /usr/local/icc-9.1/cc/bin > /usr/local/icc-9.1/fc/bin > /usr/local/gcc-4.7.1/bin > /opt/solstudio12.3/bin > /usr/local/bin > /usr/local/ssl/bin > /usr/local/pgsql/bin > /bin > /usr/bin > /usr/X11R6/bin > /usr/local/teTeX-1.0.7/bin/i586-pc-linux-gnu > /usr/local/bluej-2.1.2 > /usr/local/openmpi-1.9_64_cc/bin > /home/fd1026/Linux/x86_64/bin > . > /usr/sbin > LD_LIBRARY_PATH_32 > /usr/lib > /usr/local/jdk1.7.0_07-64/jre/lib/i386 > /usr/local/gcc-4.7.1/lib > > /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1/32 > > /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/32 > /usr/local/lib > /usr/local/ssl/lib > /lib > /usr/lib > /usr/X11R6/lib > /usr/local/openmpi-1.9_64_cc/lib > /home/fd1026/Linux/x86_64/lib > LD_LIBRARY_PATH_64 > /usr/lib64 > /usr/local/jdk1.7.0_07-64/jre/lib/amd64 > /usr/local/gcc-4.7.1/lib64 > > /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1 > > /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1 > /usr/local/lib64 > /usr/local/ssl/lib64 > /usr/lib64 > /usr/X11R6/lib64 > /usr/local/openmpi-1.9_64_cc/lib64 > /home/fd1026/Linux/x86_64/lib64 > LD_LIBRARY_PATH > /usr/lib > /usr/local/jdk1.7.0_07-64/jre/lib/i386 > /usr/local/gcc-4.7.1/lib > > /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1/32 > > /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/32 > /usr/local/lib > /usr/local/ssl/lib > /lib > /usr/lib > /usr/X11R6/lib > /usr/local/openmpi-1.9_64_cc/lib > /usr/lib64 > /usr/local/jdk1.7.0_07-64/jre/lib/amd64 > /usr/local/gcc-4.7.1/lib64 > > /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1 > > /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1 > /usr/local/lib64 > /usr/local/ssl/lib64 > /usr/lib64 > /usr/X11R6/lib64 > /usr/local/openmpi-1.9_64_cc/lib64 > /home/fd1026/Linux/x86_64/lib64 > CLASSPATH > /usr/local/junit4.10 > /usr/local/junit4.10/junit-4.10.jar > //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dcore.jar > //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dutils.jar > //usr/local/jdk1.7.0_07-64/j3d/lib/ext/vecmath.jar > /usr/local/javacc-5.0/javacc.jar > . > > Environment from task 2: > message type: 3 > msg length: 4196 characters > message: > hostname: sunpc4.informatik.hs-fulda.de > operating system: SunOS > release: 5.10 > processor: i86pc > PATH > /usr/local/eclipse-3.6.1 > /usr/local/NetBeans-4.0/bin > /usr/local/jdk1.7.0_07/bin/amd64 > /usr/local/apache-ant-1.6.2/bin > /usr/local/gcc-4.7.1/bin > /opt/solstudio12.3/bin > /usr/local/bin > /usr/local/ssl/bin > /usr/local/pgsql/bin > /usr/bin > /usr/openwin/bin > /usr/dt/bin > /usr/ccs/bin > /usr/sfw/bin > /opt/sfw/bin > /usr/ucb > /usr/lib/lp/postscript > /usr/local/teTeX-1.0.7/bin/i386-pc-solaris2.10 > /usr/local/bluej-2.1.2 > /usr/local/openmpi-1.9_64_cc/bin > /home/fd1026/SunOS/x86_64/bin > . > /usr/sbin > LD_LIBRARY_PATH_32 > /usr/lib > /usr/local/jdk1.7.0_07/jre/lib/i386 > /usr/local/gcc-4.7.1/lib > > /usr/local/gcc-4.7.1/lib/gcc/i386-pc-solaris2.10/4.7.1 > /usr/local/lib > /usr/local/ssl/lib > /usr/local/oracle > /usr/local/pgsql/lib > /usr/lib > /usr/openwin/lib > /usr/openwin/server/lib > /usr/dt/lib > /usr/X11R6/lib > /usr/ccs/lib > /usr/sfw/lib > /opt/sfw/lib > /usr/ucblib > /usr/local/openmpi-1.9_64_cc/lib > /home/fd1026/SunOS/x86_64/lib > LD_LIBRARY_PATH_64 > /usr/lib/amd64 > /usr/local/jdk1.7.0_07/jre/lib/amd64 > /usr/local/gcc-4.7.1/lib/amd64 > > /usr/local/gcc-4.7.1/lib/gcc/i386-pc-solaris2.10/4.7.1/amd64 > /usr/local/lib/amd64 > /usr/local/ssl/lib/amd64 > /usr/local/lib64 > /usr/lib/amd64 > /usr/openwin/lib/amd64 > /usr/openwin/server/lib/amd64 > /usr/dt/lib/amd64 > /usr/X11R6/lib/amd64 > /usr/ccs/lib/amd64 > /usr/sfw/lib/amd64 > /opt/sfw/lib/amd64 > /usr/ucblib/amd64 > /usr/local/openmpi-1.9_64_cc/lib64 > /home/fd1026/SunOS/x86_64/lib64 > LD_LIBRARY_PATH > /usr/lib/amd64 > /usr/local/jdk1.7.0_07/jre/lib/amd64 > /usr/local/gcc-4.7.1/lib/amd64 > > /usr/local/gcc-4.7.1/lib/gcc/i386-pc-solaris2.10/4.7.1/amd64 > /usr/local/lib/amd64 > /usr/local/ssl/lib/amd64 > /usr/local/lib64 > /usr/lib/amd64 > /usr/openwin/lib/amd64 > /usr/openwin/server/lib/amd64 > /usr/dt/lib/amd64 > /usr/X11R6/lib/amd64 > /usr/ccs/lib/amd64 > /usr/sfw/lib/amd64 > /opt/sfw/lib/amd64 > /usr/ucblib/amd64 > /usr/local/openmpi-1.9_64_cc/lib64 > /home/fd1026/SunOS/x86_64/lib64 > CLASSPATH > /usr/local/junit4.10 > /usr/local/junit4.10/junit-4.10.jar > //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar > //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar > //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar > /usr/local/javacc-5.0/javacc.jar > . > > Environment from task 3: > message type: 3 > msg length: 4394 characters > message: > hostname: rs0.informatik.hs-fulda.de > operating system: SunOS > release: 5.10 > processor: sun4u > PATH > /usr/local/eclipse-3.6.1 > /usr/local/NetBeans-4.0/bin > /usr/local/jdk1.7.0_07/bin/sparcv9 > /usr/local/apache-ant-1.6.2/bin > /usr/local/gcc-4.7.1/bin > /opt/solstudio12.3/bin > /usr/local/bin > /usr/local/ssl/bin > /usr/local/pgsql/bin > /usr/bin > /usr/openwin/bin > /usr/dt/bin > /usr/ccs/bin > /usr/sfw/bin > /opt/sfw/bin > /usr/ucb > /usr/xpg4/bin > /usr/local/teTeX-1.0.7/bin/sparc-sun-solaris2.10 > /usr/local/bluej-2.1.2 > /usr/local/openmpi-1.9_64_cc/bin > /home/fd1026/SunOS/sparc/bin > . > /usr/sbin > LD_LIBRARY_PATH_32 > /usr/lib > /usr/local/jdk1.7.0_07/jre/lib/sparc > /usr/local/gcc-4.7.1/lib > > /usr/local/gcc-4.7.1/lib/gcc/sparc-sun-solaris2.10/4.7.1 > /usr/local/lib > /usr/local/ssl/lib > /usr/local/oracle > /usr/local/pgsql/lib > /lib > /usr/lib > /usr/openwin/lib > /usr/dt/lib > /usr/X11R6/lib > /usr/ccs/lib > /usr/sfw/lib > /opt/sfw/lib > /usr/ucblib > /usr/local/openmpi-1.9_64_cc/lib > /home/fd1026/SunOS/sparc/lib > LD_LIBRARY_PATH_64 > /usr/lib/sparcv9 > /usr/local/jdk1.7.0_07/jre/lib/sparcv9 > /usr/local/gcc-4.7.1/lib/sparcv9 > > /usr/local/gcc-4.7.1/lib/gcc/sparc-sun-solaris2.10/4.7.1/sparcv9 > /usr/local/lib/sparcv9 > /usr/local/ssl/lib/sparcv9 > /usr/local/lib64 > /usr/local/oracle/sparcv9 > /usr/local/pgsql/lib/sparcv9 > /lib/sparcv9 > /usr/lib/sparcv9 > /usr/openwin/lib/sparcv9 > /usr/dt/lib/sparcv9 > /usr/X11R6/lib/sparcv9 > /usr/ccs/lib/sparcv9 > /usr/sfw/lib/sparcv9 > /opt/sfw/lib/sparcv9 > /usr/ucblib/sparcv9 > /usr/local/openmpi-1.9_64_cc/lib64 > /home/fd1026/SunOS/sparc/lib64 > LD_LIBRARY_PATH > /usr/lib/sparcv9 > /usr/local/jdk1.7.0_07/jre/lib/sparcv9 > /usr/local/gcc-4.7.1/lib/sparcv9 > > /usr/local/gcc-4.7.1/lib/gcc/sparc-sun-solaris2.10/4.7.1/sparcv9 > /usr/local/lib/sparcv9 > /usr/local/ssl/lib/sparcv9 > /usr/local/lib64 > /usr/local/oracle/sparcv9 > /usr/local/pgsql/lib/sparcv9 > /lib/sparcv9 > /usr/lib/sparcv9 > /usr/openwin/lib/sparcv9 > /usr/dt/lib/sparcv9 > /usr/X11R6/lib/sparcv9 > /usr/ccs/lib/sparcv9 > /usr/sfw/lib/sparcv9 > /opt/sfw/lib/sparcv9 > /usr/ucblib/sparcv9 > /usr/local/openmpi-1.9_64_cc/lib64 > /home/fd1026/SunOS/sparc/lib > CLASSPATH > /usr/local/junit4.10 > /usr/local/junit4.10/junit-4.10.jar > //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar > //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar > //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar > /usr/local/javacc-5.0/javacc.jar > . > > >