Hi > I'm a bit confused by your final table: > > > local machine | -host > > | sunpc1 | linpc1 | rs1 > > -----------------------------+--------+--------+------- > > sunpc1 (Solaris 10, x86_64) | ok | hangs | hangs > > linpc1 (Solaris 10, x86_64) | hangs | ok | ok > > rs1 (Solaris 10, sparc) | hangs | ok | ok > > Is linpc1 a Linux machine or Solaris machine?
I'm sorry for my copy-paste error! "linpc1" is an openSuSE Linux 12.1 machine. > Ralph and I talked about this on the phone, and it seems like > sunpc1 is just wrong somehow -- it just doesn't jive with the > error message you sent. > > Can you verify that all 3 versions were built exactly the same > way (e.g., debug or not debug)? You are right! Somehow a line continuation character disappeared in my configure command for "sunpc", so that "status.log" didn't show "--enable-debug". I have rebuilt the package for "sunpc" and now everything works fine. I'm sorry for all the unnecessary trouble. Thank you very much for all your help. Kind regards Siegmar > On May 29, 2013, at 10:31 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hello Ralph, > > > >> Could you please clarify - are you mixing 32 and 64 bit versions > >> in your runs that have a problem? > > > > No, I have four different versions on each machine. > > > > tyr fd1026 1250 ls -ld /usr/local/openmpi-1.6.5_* > > drwxr-xr-x 7 root root 512 May 23 14:00 /usr/local/openmpi-1.6.5_32_cc > > drwxr-xr-x 7 root root 512 May 23 13:55 /usr/local/openmpi-1.6.5_32_gcc > > drwxr-xr-x 7 root root 512 May 23 10:12 /usr/local/openmpi-1.6.5_64_cc > > drwxr-xr-x 7 root root 512 May 23 12:21 /usr/local/openmpi-1.6.5_64_gcc > > > > "/usr/local" is a link to machine specific files on a NFS server. > > > > lrwxrwxrwx 1 root root 25 Jan 10 07:47 local -> /export2/prog/SunOS_sparc > > lrwxrwxrwx 1 root root 26 Oct 5 2012 local -> /export2/prog/SunOS_x86_64 > > ... > > > > I can choose a package in my file "$HOME/.cshrc". > > > > tyr fd1026 1251 more .cshrc > > ... > > #set MPI = openmpi-1.6.5_32_cc > > #set MPI = openmpi-1.6.5_32_gcc > > #set MPI = openmpi-1.6.5_64_cc > > #set MPI = openmpi-1.6.5_64_gcc > > ... > > source /opt/global/cshrc > > ... > > > > > > "/opt/global/cshrc" determines the processor architecture and operating > > system and calls package specific initialization files. > > > > tyr fd1026 1258 more /opt/global/mpi.csh > > ... > > case openmpi-1.6.5_32_cc: > > case openmpi-1.6.5_32_gcc: > > case openmpi-1.6.5_64_cc: > > case openmpi-1.6.5_64_gcc: > > ... > > if (($MPI == openmpi-1.7_32_cc) || ($MPI == openmpi-1.9_32_cc) || \ > > ($MPI == ompi-java_32_cc) || ($MPI == ompi-java_32_gcc) || \ > > ($MPI == openmpi-1.7_32_gcc) || ($MPI == openmpi-1.9_32_gcc)) then > > if ($JDK != jdk1.7.0_07-32) then > > echo " " > > echo "In '${MPI}' funktioniert 'mpijavac' nur mit" > > echo "'jdk1.7.0_07-32'. Waehlen Sie bitte das entsprechende" > > echo "Paket in der Datei '${HOME}/.cshrc' aus und melden Sie" > > echo "sich ab und wieder an, wenn Sie 'mpiJava' benutzen" > > echo "wollen." > > echo " " > > endif > > endif > > ... > > setenv OPENMPI_HOME ${DIRPREFIX_PROG}/$MPI > > ... > > set path = ( $path ${OPENMPI_HOME}/bin ) > > ... > > > > Sorry for the german message in my shell script, but mpi.csh sets > > all necessary environment variables for the selected package. I > > must logout and login again, if I select a different package in > > "$HOME/.cshrc", so that I never mix environments for different > > packages, because my home directory and "/opt/global" are the > > same on all machines (they are provided via an NFS server). > > > > > >> If that isn't the case, then the error message is telling you that > >> the system thinks you are mixing optimized and debug versions - > >> i.e., one node is using an optimized version of OMPI and another > >> is using a debug version. This also isn't allowed. > > > > I build my packages with copy-paste from a file. All configure > > commands use "--enable-debug" (three different architectures with > > two different compilers each). > > > > tyr openmpi-1.6.5 1263 grep -- enable-debug README-OpenMPI-1.6.5 > > --enable-debug \ > > --enable-debug \ > > --enable-debug \ > > --enable-debug \ > > --enable-debug \ > > --enable-debug \ > > tyr openmpi-1.6.5 1264 > > > > > >> If you check and find those two conditions are okay, then I suspect > >> you are hitting the Solaris "bit rot" problem that we've talked > >> about before - and are unlikely to be able to fix any time soon. > > > > sunpc1 hello_1 113 mpiexec -mca btl ^udapl -np 4 -host sunpc1 hello_1_mpi > > Process 2 of 4 running on sunpc1 > > ... > > > > > > sunpc1 hello_1 114 mpiexec -mca btl ^udapl -np 4 -host linpc1 hello_1_mpi > > [sunpc1:05035] [[4165,0],0] ORTE_ERROR_LOG: Buffer type (described vs > > non-described) mismatch - operation not allowed in file > > ../../../../../openmpi-1.6.5a1r28554/orte/mca/grpcomm/bad/grpcomm_bad_module.c > > at line 841 > > ^Cmpiexec: killing job... > > > > > > I get the following table, if I use every machine as local machine > > and run my command on one of my hosts. > > > > > > local machine | -host > > | > > | sunpc1 | linpc1 | rs1 > > -----------------------------+--------+--------+------- > > sunpc1 (Solaris 10, x86_64) | ok | hangs | hangs > > linpc1 (Solaris 10, x86_64) | hangs | ok | ok > > rs1 (Solaris 10, sparc) | hangs | ok | ok > > > > > > > > It seems that I have a problem with Solaris x86_64 and gcc-4.8.0, > > if I use a 64-bit version of Open MPI. I have no problems with > > Sun C and a 64-bit version of Open MPI or any 32-bit version of > > Open MPI. Do you have any idea, what I can do to track the problem > > and to get a solution? > > > > > > Kind regards > > > > Siegmar > > > > > > > >> On May 24, 2013, at 12:02 AM, Siegmar Gross > > <siegmar.gr...@informatik.hs-fulda.de> wrote: > >> > >>> Hi > >>> > >>> I installed openmpi-1.6.5a1r28554 on "openSuSE Linux 12.1", "Solaris 10 > >>> x86_64", and "Solaris 10 sparc" with gcc-4.8.0 and "Sun C 5.12" in 32- > >>> and 64-bit versions. Unfortunately I have a problem with the 64-bit > >>> version, if I build Open MPI with gcc. The program hangs and I have > >>> to terminate it with <Ctrl-c>. > >>> > >>> > >>> sunpc1 hello_1 144 mpiexec -mca btl ^udapl -np 4 \ > >>> -host sunpc1,linpc1,rs0 hello_1_mpi > >>> [sunpc1:15576] [[16182,0],0] ORTE_ERROR_LOG: Buffer type (described vs > >>> non-described) mismatch - operation not allowed in file > >>> > > ../../../../../openmpi-1.6.5a1r28554/orte/mca/grpcomm/bad/grpcomm_bad_module.c > >>> at line 841 > >>> ^Cmpiexec: killing job... > >>> > >>> sunpc1 hello_1 145 which mpiexec > >>> /usr/local/openmpi-1.6.5_64_gcc/bin/mpiexec > >>> sunpc1 hello_1 146 > >>> > >>> > >>> I have no problems with the 64-bit version, if I compile Open MPI > >>> with Sun C. Both 32-bit versions (compiled with "cc" or "gcc") work > >>> as expectedas well. > >>> > >>> sunpc1 hello_1 106 mpiexec -mca btl ^udapl -np 4 \ > >>> -host sunpc1,linpc1,rs0 hello_1_mpi > >>> Process 2 of 4 running on rs0.informatik.hs-fulda.de > >>> Process 0 of 4 running on sunpc1 > >>> Process 3 of 4 running on sunpc1 > >>> Process 1 of 4 running on linpc1 > >>> Now 3 slave tasks are sending greetings. > >>> Greetings from task 3: > >>> message type: 3 > >>> msg length: 116 characters > >>> message: > >>> hostname: sunpc1 > >>> operating system: SunOS > >>> release: 5.10 > >>> processor: i86pc > >>> ... > >>> > >>> sunpc1 hello_1 107 which mpiexec > >>> /usr/local/openmpi-1.6.5_64_cc/bin/mpiexec > >>> > >>> > >>> > >>> sunpc1 hello_1 106 mpiexec -mca btl ^udapl -np 4 \ > >>> -host sunpc1,linpc1,rs0 hello_1_mpi > >>> Process 2 of 4 running on rs0.informatik.hs-fulda.de > >>> Process 3 of 4 running on sunpc1 > >>> Process 0 of 4 running on sunpc1 > >>> Process 1 of 4 running on linpc1 > >>> ... > >>> > >>> sunpc1 hello_1 107 which mpiexec > >>> /usr/local/openmpi-1.6.5_32_gcc/bin/mpiexec > >>> > >>> > >>> I would be grateful, if somebody can fix the problem for the > >>> 64-bit version with gcc. Thank you very much for any help in > >>> advance. > >>> > >>> > >>> Kind regards > >>> > >>> Siegmar > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users