Ah -- sorry, I missed this mail before I replied to the other thread (OS X Mail threaded them separately somehow...).
Sorry to ask you to dive deeper, but can you find out where in orte_ess.init() it's failing? orte_ess.init is actually a function pointer; it's a jump-off point into a dlopen'ed plugin. On Nov 25, 2013, at 11:53 AM, "Meredith, Karl" <karl.mered...@fmglobal.com> wrote: > Digging a little deeper by running the code in the lldb debugger, I found > that the stall occurs in a call to init_orte from ompi_mpi_init.c: > 356 /* Setup ORTE - note that we are an MPI process */ > 357 if (ORTE_SUCCESS != (ret = orte_init(NULL, NULL, ORTE_PROC_MPI))) { > 358 error = "ompi_mpi_init: orte_init failed"; > 359 goto error; > 360 } > > The code never returns from orte_init. > > It gets stuck in orte_ess.init() called from orte_init.c: > 126 /* initialize the RTE for this environment */ > 127 if (ORTE_SUCCESS != (ret = orte_ess.init())) { > > When I step through this orte_ess_init in the lldb debugger, I actually get > some output from the code (no output if not using the debugger and stepping > through): > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Unable to start a daemon on the local node" (-128) instead of > "Success" (0) > > > > Karl > > > > On Nov 25, 2013, at 9:20 AM, Meredith, Karl <karl.mered...@fmglobal.com> > wrote: > >> Here’s the back trace from lldb: >> $ )ps -elf | grep hello >> 1042653210 45231 45230 4006 0 31 0 2448976 2148 - S+ >> 0 ttys002 0:00.01 hello_cxx 9:07AM >> 1042653210 45232 45230 4006 0 31 0 2457168 2156 - S+ >> 0 ttys002 0:00.04 hello_cxx 9:07AM >> >> (meredithk@meredithk-mac)-(09:15 AM Mon Nov >> 25)-(~/tools/openmpi-1.6.5/examples) >> $ )lldb -p 45231 >> Attaching to process with: >> process attach -p 45231 >> Process 45231 stopped >> Executable module set to >> "/Users/meredithk/tools/openmpi-1.6.5/examples/hello_cxx". >> Architecture set to: x86_64-apple-macosx. >> (lldb) bt >> * thread #1: tid = 0x168535, 0x00007fff8c1859aa >> libsystem_kernel.dylib`select$DARWIN_EXTSN + 10, queue = >> 'com.apple.main-thread, stop reason = signal SIGSTOP >> frame #0: 0x00007fff8c1859aa libsystem_kernel.dylib`select$DARWIN_EXTSN + >> 10 >> frame #1: 0x0000000106b73ea0 >> libmpi.1.dylib`select_dispatch(base=0x00007f84c3c0b430, >> arg=0x00007f84c3c0b3e0, tv=0x00007fff5924ca70) + 80 at select.c:174 >> frame #2: 0x0000000106b3eb0f >> libmpi.1.dylib`opal_event_base_loop(base=0x00007f84c3c0b430, flags=5) + 415 >> at event.c:838 >> >> Both processors are at this state. >> >> Here’s the output from otool -L ./hello_cxx: >> >> $ )otool -L ./hello_cxx >> ./hello_cxx: >> /Users/meredithk/tools/openmpi/lib/libmpi_cxx.1.dylib (compatibility >> version 2.0.0, current version 2.2.0) >> /Users/meredithk/tools/openmpi/lib/libmpi.1.dylib (compatibility >> version 2.0.0, current version 2.8.0) >> /opt/local/lib/libgcc/libstdc++.6.dylib (compatibility version 7.0.0, >> current version 7.18.0) >> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current >> version 1197.1.1) >> /opt/local/lib/libgcc/libgcc_s.1.dylib (compatibility version 1.0.0, >> current version 1.0.0) >> >> >> On Nov 25, 2013, at 9:14 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >>> Mac OS X 1.9 dropped support for gdb. Please report the output of lldb >>> instead. >>> >>> Also, can you run “otool -L ./hello_cxx” and report the output. >>> >>> Thanks, >>> George. >>> >>> >>> On Nov 25, 2013, at 15:09 , Meredith, Karl <karl.mered...@fmglobal.com> >>> wrote: >>> >>>> I do have DYLD_LIBRARY_PATH set to the same paths as LD_LIBRARY_PATH. >>>> This does not resolve the problem. The code still hangs on MPI::Init(). >>>> >>>> Another thing I tried is I recompiled openmpi with the debug flags >>>> activated: >>>> ./configure --prefix=$HOME/tools/openmpi --enable-debug >>>> make >>>> make install >>>> >>>> Then, I attached to the running process using gdb. I tried to do a back >>>> trace and see where it was hanging up at, but all I got was this: >>>> Attaching to process 45231 >>>> Reading symbols from >>>> /Users/meredithk/tools/openmpi-1.6.5/examples/hello_cxx...Reading symbols >>>> from >>>> /Users/meredithk/tools/openmpi-1.6.5/examples/hello_cxx.dSYM/Contents/Resources/DWARF/hello_cxx...done. >>>> done. >>>> 0x00007fff8c1859aa in ?? () >>>> (gdb) bt >>>> #0 0x00007fff8c1859aa in ?? () >>>> #1 0x0000000106b73ea0 in ?? () >>>> #2 0x706d6e65706f2f2f in ?? () >>>> #3 0x0000000000000001 in ?? () >>>> #4 0x0000000000000000 in ?? () >>>> >>>> This output from gdb was not terribly helpful to me. >>>> >>>> Karl >>>> >>>> >>>> On Nov 25, 2013, at 8:30 AM, Hammond, Simon David (-EXP) >>>> <sdha...@sandia.gov<mailto:sdha...@sandia.gov>> wrote: >>>> >>>> We have occasionally had a problem like this when we set LD_LIBRARY_PATH >>>> only. On OSX you may need to set DYLD_LIBRARY_PATH instead ( set it to the >>>> same lib directory ) >>>> >>>> Can you try that and see if it resolves the problem? >>>> >>>> >>>> >>>> Si Hammond >>>> Sandia National Laboratories >>>> Remote Connection >>>> >>>> >>>> -----Original Message----- >>>> From: Meredith, Karl >>>> [karl.mered...@fmglobal.com<mailto:karl.mered...@fmglobal.com>] >>>> Sent: Monday, November 25, 2013 06:25 AM Mountain Standard Time >>>> To: Open MPI Users >>>> Subject: [EXTERNAL] Re: [OMPI users] open-mpi on Mac OS 10.9 (Mavericks) >>>> >>>> >>>> I do have these two environment variables set: >>>> >>>> LD_LIBRARY_PATH=/Users/meredithk/tools/openmpi/lib >>>> PATH=/Users/meredithk/tools/openmpi/bin >>>> >>>> Running mpirun seems to work fine with a simple command, like hostname: >>>> >>>> $ )mpirun -n 2 hostname >>>> meredithk-mac.corp.fmglobal.com<http://meredithk-mac.corp.fmglobal.com> >>>> meredithk-mac.corp.fmglobal.com<http://meredithk-mac.corp.fmglobal.com> >>>> >>>> I am trying to run the simple hello_cxx example from the openmpi >>>> distribution, compiled as such: >>>> mpic++ -g hello_cxx.cc -o hello_cxx >>>> >>>> It compiles fine, without warning or error. However, when I go to run the >>>> example, it stalls on the MPI::Init() command: >>>> mpirun -np 1 hello_cxx >>>> It never errors out or crashes. It simply hangs. >>>> >>>> I am using the same mpic++ and mpirun version: >>>> $ )which mpirun >>>> /Users/meredithk/tools/openmpi/bin/mpirun >>>> >>>> $ )which mpic++ >>>> /Users/meredithk/tools/openmpi/bin/mpic++ >>>> >>>> Not quite sure what else to check. >>>> >>>> Karl >>>> >>>> >>>> On Nov 23, 2013, at 5:29 PM, Ralph Castain >>>> <r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote: >>>> >>>>> Strange - I run on Mavericks now without problem. Can you run "mpirun -n >>>>> 1 hostname"? >>>>> >>>>> You also might want to check your PATH and LD_LIBRARY_PATH to ensure you >>>>> have the prefix where you installed OMPI 1.6.5 at the front. Mac >>>>> distributes a very old version of OMPI with its software and you don't >>>>> want to pick it up by mistake. >>>>> >>>>> >>>>> On Nov 22, 2013, at 1:45 PM, Meredith, Karl >>>>> <karl.mered...@fmglobal.com<mailto:karl.mered...@fmglobal.com>> wrote: >>>>> >>>>>> I recently upgraded my 2013 Macbook Pro (Retina display) from 10.8 to >>>>>> 10.9. I downloaded and installed openmpi-1.6.5 and compiled it with gcc >>>>>> 4.8 (gcc installed from macports). >>>>>> openmpi compiled and installed without error. >>>>>> >>>>>> However, when I try to run any of the example test cases, the code gets >>>>>> stuck inside the first MPI::Init() call and never returns. >>>>>> >>>>>> Any thoughts on what might be going wrong? >>>>>> >>>>>> The same install on OS 10.8 works fine and the example test cases run >>>>>> without error. >>>>>> >>>>>> Karl >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org<mailto:us...@open-mpi.org> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org<mailto:us...@open-mpi.org> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org<mailto:us...@open-mpi.org> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org<mailto:us...@open-mpi.org> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/