Hi

> I'm a bit confused by your final table:
> 
> > local machine                | -host
> >                              | sunpc1 | linpc1 | rs1
> > -----------------------------+--------+--------+-------
> > sunpc1 (Solaris 10, x86_64)  | ok     | hangs  | hangs
> > linpc1 (Solaris 10, x86_64)  | hangs  | ok     | ok
> > rs1    (Solaris 10, sparc)   | hangs  | ok     | ok
> 
> Is linpc1 a Linux machine or Solaris machine?

I'm sorry for my copy-paste error! "linpc1" is an openSuSE
Linux 12.1 machine.


> Ralph and I talked about this on the phone, and it seems like
> sunpc1 is just wrong somehow -- it just doesn't jive with the
> error message you sent.
> 
> Can you verify that all 3 versions were built exactly the same
> way (e.g., debug or not debug)?

You are right! Somehow a line continuation character disappeared
in my configure command for "sunpc", so that "status.log" didn't
show "--enable-debug". I have rebuilt the package for "sunpc" and
now everything works fine. I'm sorry for all the unnecessary trouble.
Thank you very much for all your help.


Kind regards

Siegmar



> On May 29, 2013, at 10:31 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> > Hello Ralph,
> > 
> >> Could you please clarify - are you mixing 32 and 64 bit versions
> >> in your runs that have a problem?
> > 
> > No, I have four different versions on each machine.
> > 
> > tyr fd1026 1250 ls -ld /usr/local/openmpi-1.6.5_*
> > drwxr-xr-x 7 root root 512 May 23 14:00 /usr/local/openmpi-1.6.5_32_cc
> > drwxr-xr-x 7 root root 512 May 23 13:55 /usr/local/openmpi-1.6.5_32_gcc
> > drwxr-xr-x 7 root root 512 May 23 10:12 /usr/local/openmpi-1.6.5_64_cc
> > drwxr-xr-x 7 root root 512 May 23 12:21 /usr/local/openmpi-1.6.5_64_gcc
> > 
> > "/usr/local" is a link to machine specific files on a NFS server.
> > 
> > lrwxrwxrwx 1 root root 25 Jan 10 07:47 local -> /export2/prog/SunOS_sparc
> > lrwxrwxrwx 1 root root 26 Oct  5  2012 local -> /export2/prog/SunOS_x86_64
> > ...
> > 
> > I can choose a package in my file "$HOME/.cshrc".
> > 
> > tyr fd1026 1251 more .cshrc
> > ...
> > #set MPI = openmpi-1.6.5_32_cc
> > #set MPI = openmpi-1.6.5_32_gcc
> > #set MPI = openmpi-1.6.5_64_cc
> > #set MPI = openmpi-1.6.5_64_gcc
> > ...
> > source /opt/global/cshrc
> > ...
> > 
> > 
> > "/opt/global/cshrc" determines the processor architecture and operating
> > system and calls package specific initialization files.
> > 
> > tyr fd1026 1258 more /opt/global/mpi.csh 
> > ...
> >    case openmpi-1.6.5_32_cc:
> >    case openmpi-1.6.5_32_gcc:
> >    case openmpi-1.6.5_64_cc:
> >    case openmpi-1.6.5_64_gcc:
> > ...
> >      if (($MPI == openmpi-1.7_32_cc) || ($MPI == openmpi-1.9_32_cc) || \
> >          ($MPI == ompi-java_32_cc) || ($MPI == ompi-java_32_gcc) || \
> >          ($MPI == openmpi-1.7_32_gcc) || ($MPI == openmpi-1.9_32_gcc)) then
> >        if ($JDK != jdk1.7.0_07-32) then
> >          echo " "
> >          echo "In '${MPI}' funktioniert 'mpijavac' nur mit"
> >          echo "'jdk1.7.0_07-32'. Waehlen Sie bitte das entsprechende"
> >          echo "Paket in der Datei '${HOME}/.cshrc' aus und melden Sie"
> >          echo "sich ab und wieder an, wenn Sie 'mpiJava' benutzen"
> >          echo "wollen."
> >          echo " "
> >        endif
> >      endif
> > ...
> >      setenv OPENMPI_HOME  ${DIRPREFIX_PROG}/$MPI
> > ...
> >      set path = ( $path ${OPENMPI_HOME}/bin )
> > ...
> > 
> > Sorry for the german message in my shell script, but mpi.csh sets
> > all necessary environment variables for the selected package. I
> > must logout and login again, if I select a different package in
> > "$HOME/.cshrc", so that I never mix environments for different
> > packages, because my home directory and "/opt/global" are the
> > same on all machines (they are provided via an NFS server).
> > 
> > 
> >> If that isn't the case, then the error message is telling you that
> >> the system thinks you are mixing optimized and debug versions -
> >> i.e., one node is using an optimized version of OMPI and another
> >> is using a debug version. This also isn't allowed.
> > 
> > I build my packages with copy-paste from a file. All configure
> > commands use "--enable-debug" (three different architectures with
> > two different compilers each).
> > 
> > tyr openmpi-1.6.5 1263 grep -- enable-debug README-OpenMPI-1.6.5 
> >  --enable-debug \
> >  --enable-debug \
> >  --enable-debug \
> >  --enable-debug \
> >  --enable-debug \
> >  --enable-debug \
> > tyr openmpi-1.6.5 1264 
> > 
> > 
> >> If you check and find those two conditions are okay, then I suspect
> >> you are hitting the Solaris "bit rot" problem that we've talked
> >> about before - and are unlikely to be able to fix any time soon.
> > 
> > sunpc1 hello_1 113 mpiexec -mca btl ^udapl -np 4 -host sunpc1 hello_1_mpi
> > Process 2 of 4 running on sunpc1
> > ...
> > 
> > 
> > sunpc1 hello_1 114 mpiexec -mca btl ^udapl -np 4 -host linpc1 hello_1_mpi
> > [sunpc1:05035] [[4165,0],0] ORTE_ERROR_LOG: Buffer type (described vs 
> > non-described) mismatch - operation not allowed in file 
> > 
../../../../../openmpi-1.6.5a1r28554/orte/mca/grpcomm/bad/grpcomm_bad_module.c 
> > at line 841
> > ^Cmpiexec: killing job...
> > 
> > 
> > I get the following table, if I use every machine as local machine
> > and run my command on one of my hosts.
> > 
> > 
> > local machine                | -host
> >                             |
> >                             | sunpc1 | linpc1 | rs1
> > -----------------------------+--------+--------+-------
> > sunpc1 (Solaris 10, x86_64)  | ok     | hangs  | hangs
> > linpc1 (Solaris 10, x86_64)  | hangs  | ok     | ok
> > rs1    (Solaris 10, sparc)   | hangs  | ok     | ok
> > 
> > 
> > 
> > It seems that I have a problem with Solaris x86_64 and gcc-4.8.0,
> > if I use a 64-bit version of Open MPI. I have no problems with
> > Sun C and a 64-bit version of Open MPI or any 32-bit version of
> > Open MPI. Do you have any idea, what I can do to track the problem
> > and to get a solution?
> > 
> > 
> > Kind regards
> > 
> > Siegmar
> > 
> > 
> > 
> >> On May 24, 2013, at 12:02 AM, Siegmar Gross 
> > <siegmar.gr...@informatik.hs-fulda.de> wrote:
> >> 
> >>> Hi
> >>> 
> >>> I installed openmpi-1.6.5a1r28554 on "openSuSE Linux 12.1", "Solaris 10
> >>> x86_64", and "Solaris 10 sparc" with gcc-4.8.0 and "Sun C 5.12" in 32-
> >>> and 64-bit versions. Unfortunately I have a problem with the 64-bit
> >>> version, if I build Open MPI with gcc. The program hangs and I have
> >>> to terminate it with <Ctrl-c>.
> >>> 
> >>> 
> >>> sunpc1 hello_1 144 mpiexec -mca btl ^udapl -np 4 \
> >>> -host sunpc1,linpc1,rs0 hello_1_mpi
> >>> [sunpc1:15576] [[16182,0],0] ORTE_ERROR_LOG: Buffer type (described vs
> >>> non-described) mismatch - operation not allowed in file
> >>> 
> > 
../../../../../openmpi-1.6.5a1r28554/orte/mca/grpcomm/bad/grpcomm_bad_module.c
> >>> at line 841
> >>> ^Cmpiexec: killing job...
> >>> 
> >>> sunpc1 hello_1 145 which mpiexec
> >>> /usr/local/openmpi-1.6.5_64_gcc/bin/mpiexec
> >>> sunpc1 hello_1 146 
> >>> 
> >>> 
> >>> I have no problems with the 64-bit version, if I compile Open MPI
> >>> with Sun C. Both 32-bit versions (compiled with "cc" or "gcc") work
> >>> as expectedas well.
> >>> 
> >>> sunpc1 hello_1 106 mpiexec -mca btl ^udapl -np 4 \
> >>> -host sunpc1,linpc1,rs0 hello_1_mpi
> >>> Process 2 of 4 running on rs0.informatik.hs-fulda.de
> >>> Process 0 of 4 running on sunpc1
> >>> Process 3 of 4 running on sunpc1
> >>> Process 1 of 4 running on linpc1
> >>> Now 3 slave tasks are sending greetings.
> >>> Greetings from task 3:
> >>> message type:        3
> >>> msg length:          116 characters
> >>> message:             
> >>>   hostname:          sunpc1
> >>>   operating system:  SunOS
> >>>   release:           5.10
> >>>   processor:         i86pc
> >>> ...
> >>> 
> >>> sunpc1 hello_1 107 which mpiexec
> >>> /usr/local/openmpi-1.6.5_64_cc/bin/mpiexec
> >>> 
> >>> 
> >>> 
> >>> sunpc1 hello_1 106 mpiexec -mca btl ^udapl -np 4 \
> >>> -host sunpc1,linpc1,rs0 hello_1_mpi
> >>> Process 2 of 4 running on rs0.informatik.hs-fulda.de
> >>> Process 3 of 4 running on sunpc1
> >>> Process 0 of 4 running on sunpc1
> >>> Process 1 of 4 running on linpc1
> >>> ...
> >>> 
> >>> sunpc1 hello_1 107 which mpiexec
> >>> /usr/local/openmpi-1.6.5_32_gcc/bin/mpiexec
> >>> 
> >>> 
> >>> I would be grateful, if somebody can fix the problem for the
> >>> 64-bit version with gcc. Thank you very much for any help in
> >>> advance.
> >>> 
> >>> 
> >>> Kind regards
> >>> 
> >>> Siegmar
> >>> 
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> 
> >> 
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to