Digging a little deeper by running the code in the lldb debugger, I found that 
the stall occurs in a call to init_orte from ompi_mpi_init.c:
   356     /* Setup ORTE - note that we are an MPI process  */
   357     if (ORTE_SUCCESS != (ret = orte_init(NULL, NULL, ORTE_PROC_MPI))) {
   358         error = "ompi_mpi_init: orte_init failed";
   359         goto error;
   360     }

The code never returns from orte_init.

It gets stuck in orte_ess.init() called from orte_init.c:
   126     /* initialize the RTE for this environment */
   127     if (ORTE_SUCCESS != (ret = orte_ess.init())) {

When I step through this orte_ess_init in the lldb debugger, I actually get 
some output from the code (no output if not using the debugger and stepping 
through):
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128) instead of 
"Success" (0)



Karl



On Nov 25, 2013, at 9:20 AM, Meredith, Karl <karl.mered...@fmglobal.com> wrote:

> Here’s the back trace from lldb:
> $ )ps -elf | grep  hello
> 1042653210 45231 45230     4006   0  31  0  2448976   2148 -      S+          
>         0 ttys002    0:00.01 hello_cxx         9:07AM
> 1042653210 45232 45230     4006   0  31  0  2457168   2156 -      S+          
>         0 ttys002    0:00.04 hello_cxx         9:07AM
> 
> (meredithk@meredithk-mac)-(09:15 AM Mon Nov 
> 25)-(~/tools/openmpi-1.6.5/examples)
> $ )lldb -p 45231
> Attaching to process with:
>    process attach -p 45231
> Process 45231 stopped
> Executable module set to 
> "/Users/meredithk/tools/openmpi-1.6.5/examples/hello_cxx".
> Architecture set to: x86_64-apple-macosx.
> (lldb) bt
> * thread #1: tid = 0x168535, 0x00007fff8c1859aa 
> libsystem_kernel.dylib`select$DARWIN_EXTSN + 10, queue = 
> 'com.apple.main-thread, stop reason = signal SIGSTOP
>    frame #0: 0x00007fff8c1859aa libsystem_kernel.dylib`select$DARWIN_EXTSN + 
> 10
>    frame #1: 0x0000000106b73ea0 
> libmpi.1.dylib`select_dispatch(base=0x00007f84c3c0b430, 
> arg=0x00007f84c3c0b3e0, tv=0x00007fff5924ca70) + 80 at select.c:174
>    frame #2: 0x0000000106b3eb0f 
> libmpi.1.dylib`opal_event_base_loop(base=0x00007f84c3c0b430, flags=5) + 415 
> at event.c:838
> 
> Both processors are at this state.
> 
> Here’s the output from otool -L ./hello_cxx:
> 
> $ )otool -L ./hello_cxx
> ./hello_cxx:
>       /Users/meredithk/tools/openmpi/lib/libmpi_cxx.1.dylib (compatibility 
> version 2.0.0, current version 2.2.0)
>       /Users/meredithk/tools/openmpi/lib/libmpi.1.dylib (compatibility 
> version 2.0.0, current version 2.8.0)
>       /opt/local/lib/libgcc/libstdc++.6.dylib (compatibility version 7.0.0, 
> current version 7.18.0)
>       /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
> version 1197.1.1)
>       /opt/local/lib/libgcc/libgcc_s.1.dylib (compatibility version 1.0.0, 
> current version 1.0.0)
> 
> 
> On Nov 25, 2013, at 9:14 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> Mac OS X 1.9 dropped support for gdb. Please report the output of lldb 
>> instead.
>> 
>> Also, can you run “otool -L ./hello_cxx” and report the output.
>> 
>> Thanks,
>>   George.
>> 
>> 
>> On Nov 25, 2013, at 15:09 , Meredith, Karl <karl.mered...@fmglobal.com> 
>> wrote:
>> 
>>> I do have DYLD_LIBRARY_PATH set to the same paths as LD_LIBRARY_PATH.  This 
>>> does not resolve the problem.  The code still hangs on MPI::Init().
>>> 
>>> Another thing I tried is I recompiled openmpi with the debug flags 
>>> activated:
>>> ./configure --prefix=$HOME/tools/openmpi --enable-debug
>>> make
>>> make install
>>> 
>>> Then, I attached to the running process using gdb.  I tried to do a back 
>>> trace and see where it was hanging up at, but all I got was this:
>>> Attaching to process 45231
>>> Reading symbols from 
>>> /Users/meredithk/tools/openmpi-1.6.5/examples/hello_cxx...Reading symbols 
>>> from 
>>> /Users/meredithk/tools/openmpi-1.6.5/examples/hello_cxx.dSYM/Contents/Resources/DWARF/hello_cxx...done.
>>> done.
>>> 0x00007fff8c1859aa in ?? ()
>>> (gdb) bt
>>> #0  0x00007fff8c1859aa in ?? ()
>>> #1  0x0000000106b73ea0 in ?? ()
>>> #2  0x706d6e65706f2f2f in ?? ()
>>> #3  0x0000000000000001 in ?? ()
>>> #4  0x0000000000000000 in ?? ()
>>> 
>>> This output from gdb was not terribly helpful to me.
>>> 
>>> Karl
>>> 
>>> 
>>> On Nov 25, 2013, at 8:30 AM, Hammond, Simon David (-EXP) 
>>> <sdha...@sandia.gov<mailto:sdha...@sandia.gov>> wrote:
>>> 
>>> We have occasionally had a problem like this when we set LD_LIBRARY_PATH 
>>> only. On OSX you may need to set DYLD_LIBRARY_PATH instead ( set it to the 
>>> same lib directory )
>>> 
>>> Can you try that and see if it resolves the problem?
>>> 
>>> 
>>> 
>>> Si Hammond
>>> Sandia National Laboratories
>>> Remote Connection
>>> 
>>> 
>>> -----Original Message-----
>>> From: Meredith, Karl 
>>> [karl.mered...@fmglobal.com<mailto:karl.mered...@fmglobal.com>]
>>> Sent: Monday, November 25, 2013 06:25 AM Mountain Standard Time
>>> To: Open MPI Users
>>> Subject: [EXTERNAL] Re: [OMPI users] open-mpi on Mac OS 10.9 (Mavericks)
>>> 
>>> 
>>> I do have these two environment variables set:
>>> 
>>> LD_LIBRARY_PATH=/Users/meredithk/tools/openmpi/lib
>>> PATH=/Users/meredithk/tools/openmpi/bin
>>> 
>>> Running mpirun seems to work fine with a simple command, like hostname:
>>> 
>>> $ )mpirun -n 2 hostname
>>> meredithk-mac.corp.fmglobal.com<http://meredithk-mac.corp.fmglobal.com>
>>> meredithk-mac.corp.fmglobal.com<http://meredithk-mac.corp.fmglobal.com>
>>> 
>>> I am trying to run the simple hello_cxx example from the openmpi 
>>> distribution, compiled as such:
>>> mpic++ -g    hello_cxx.cc   -o hello_cxx
>>> 
>>> It compiles fine, without warning or error.  However, when I go to run the 
>>> example, it stalls on the MPI::Init() command:
>>> mpirun -np 1 hello_cxx
>>> It never errors out or crashes.  It simply hangs.
>>> 
>>> I am using the same mpic++ and mpirun version:
>>> $ )which mpirun
>>> /Users/meredithk/tools/openmpi/bin/mpirun
>>> 
>>> $ )which mpic++
>>> /Users/meredithk/tools/openmpi/bin/mpic++
>>> 
>>> Not quite sure what else to check.
>>> 
>>> Karl
>>> 
>>> 
>>> On Nov 23, 2013, at 5:29 PM, Ralph Castain 
>>> <r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:
>>> 
>>>> Strange - I run on Mavericks now without problem. Can you run "mpirun -n 1 
>>>> hostname"?
>>>> 
>>>> You also might want to check your PATH and LD_LIBRARY_PATH to ensure you 
>>>> have the prefix where you installed OMPI 1.6.5 at the front. Mac 
>>>> distributes a very old version of OMPI with its software and you don't 
>>>> want to pick it up by mistake.
>>>> 
>>>> 
>>>> On Nov 22, 2013, at 1:45 PM, Meredith, Karl 
>>>> <karl.mered...@fmglobal.com<mailto:karl.mered...@fmglobal.com>> wrote:
>>>> 
>>>>> I recently upgraded my 2013 Macbook Pro (Retina display) from 10.8 to 
>>>>> 10.9.  I downloaded and installed openmpi-1.6.5 and compiled it with gcc 
>>>>> 4.8 (gcc installed from macports).
>>>>> openmpi compiled and installed without error.
>>>>> 
>>>>> However, when I try to run any of the example test cases, the code gets 
>>>>> stuck inside the first MPI::Init() call and never returns.
>>>>> 
>>>>> Any thoughts on what might be going wrong?
>>>>> 
>>>>> The same install on OS 10.8 works fine and the example test cases run 
>>>>> without error.
>>>>> 
>>>>> Karl
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org<mailto:us...@open-mpi.org>
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org<mailto:us...@open-mpi.org>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org<mailto:us...@open-mpi.org>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org<mailto:us...@open-mpi.org>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to