Scott, This shows the deadlock arrives during the local spawn. Here is how things are supposed to work: the mpirun process (parent) will fork (the child), and these 2 processes are connected through a pipe. The child will then execve the desired command (hostname in your case), and this will close the child end of the pipe. This event is detected by the parent, and translated into a successful launch. If the execve fails, then the child will print something in the connected pipe, and the parent will know something was wrong. Thus, in both cases the pipe with the child is expected to break soon after the fork, allowing the parent to know how to handle the case.
If you block there, this means that the child process somehow was unable to execve the command, getting stuck waiting for something to happen to the pipe. So we need to look at the child to see what is going on there. Unfortunately this is complicated because unlike gdb, lldb has weak support for easy tracking the forked child process. A very recent version of lldb (via brew or macport) might give you access to `settings set target.process.follow-fork-mode child` (very early in the debugging process) to force lldb to follow the child instead of the parent. If not you might want to install gdb to track the child process (more info here: https://sourceware.org/gdb/onlinedocs/gdb/Forks.html). George. On Wed, May 4, 2022 at 1:01 PM Scott Sayres <ssay...@asu.edu> wrote: > Hi George, Thanks! You have just taught me a new trick. Although I do > not yet understand the output, it is below: > > scottsayres@scotts-mbp ~ % lldb mpirun -- -np 1 hostname > > (lldb) target create "mpirun" > > Current executable set to 'mpirun' (arm64). > > (lldb) settings set -- target.run-args "-np" "1" "hostname" > > (lldb) run > > Process 14031 launched: '/opt/homebrew/bin/mpirun' (arm64) > > 2022-05-04 09:53:11.037030-0700 mpirun[14031:1194363] > [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device > information! Invalid enumerated value! > > > 2022-05-04 09:53:11.037133-0700 mpirun[14031:1194363] > [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device > information! Invalid enumerated value! > > > 2022-05-04 09:53:11.037142-0700 mpirun[14031:1194363] > [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device > information! Invalid enumerated value! > > > Process 14031 stopped > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP > > frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8 > > libsystem_kernel.dylib`read: > > -> 0x1bde25654 <+8>: b.lo 0x1bde25674 ; <+40> > > 0x1bde25658 <+12>: pacibsp > > 0x1bde2565c <+16>: stp x29, x30, [sp, #-0x10]! > > 0x1bde25660 <+20>: mov x29, sp > > Target 0: (mpirun) stopped. > > (lldb) thread backtrace > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP > > * frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8 > > frame #1: 0x0000000100363620 libopen-pal.40.dylib`opal_fd_read + 52 > > frame #2: 0x000000010784b418 > mca_odls_default.so`odls_default_fork_local_proc > + 284 > > frame #3: 0x00000001002c7914 > libopen-rte.40.dylib`orte_odls_base_spawn_proc > + 968 > > frame #4: 0x00000001003d96dc > libevent_core-2.1.7.dylib`event_process_active_single_queue > + 960 > > frame #5: 0x00000001003d6584 libevent_core-2.1.7.dylib`event_base_loop > + 952 > > frame #6: 0x0000000100003cd8 mpirun`orterun + 216 > > frame #7: 0x0000000100019088 dyld`start + 516 > > > On Wed, May 4, 2022 at 9:36 AM George Bosilca via users < > users@lists.open-mpi.org> wrote: > >> I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can >> run both MPI and non-MPI apps without any issues. >> >> Try running `lldb mpirun -- -np 1 hostname` and once it deadlocks, do a >> CTRL+C to get back on the debugger and then `backtrace` to see where it is >> waiting. >> >> George. >> >> >> On Wed, May 4, 2022 at 11:28 AM Scott Sayres via users < >> users@lists.open-mpi.org> wrote: >> >>> Thanks for looking at this Jeff. >>> No, I cannot use mpirun to launch a non-MPI application. The command >>> "mpirun -np 2 hostname" also hangs. >>> >>> I get the following output if I add the -d command before (I've >>> replaced the server with the hashtags) : >>> >>> [scotts-mbp.3500.dhcp.###:05469] procdir: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T//ompi.scotts-mbp.501/pid.5469/0/0 >>> >>> [scotts-mbp.3500.dhcp.###:05469] jobdir: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T//ompi.scotts-mbp.501/pid.5469/0 >>> >>> [scotts-mbp.3500.dhcp.###:05469] top: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T//ompi.scotts-mbp.501/pid.5469 >>> >>> [scotts-mbp.3500.dhcp.###:05469] top: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T//ompi.scotts-mbp.501 >>> >>> [scotts-mbp.3500.dhcp.###:05469] tmp: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T/ >>> >>> [scotts-mbp.3500.dhcp.###:05469] sess_dir_cleanup: job session dir does >>> not exist >>> >>> [scotts-mbp.3500.dhcp.###:05469] sess_dir_cleanup: top session dir not >>> empty - leaving >>> >>> [scotts-mbp.3500.dhcp.###:05469] procdir: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T//ompi.scotts-mbp.501/pid.5469/0/0 >>> >>> [scotts-mbp.3500.dhcp.###:05469] jobdir: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T//ompi.scotts-mbp.501/pid.5469/0 >>> >>> [scotts-mbp.3500.dhcp.###:05469] top: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T//ompi.scotts-mbp.501/pid.5469 >>> >>> [scotts-mbp.3500.dhcp.###:05469] top: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T//ompi.scotts-mbp.501 >>> >>> [scotts-mbp.3500.dhcp.###:05469] tmp: >>> /var/folders/l0/94hsdtwj09xd62d90nfh_3h00000gn/T/ >>> >>> [scotts-mbp.3500.dhcp.###:05469] [[48286,0],0] Releasing job data for >>> [INVALID] >>> >>> Can you recommend a way to find where mpirun gets stuck? >>> Thanks! >>> Scott >>> >>> On Wed, May 4, 2022 at 6:06 AM Jeff Squyres (jsquyres) < >>> jsquy...@cisco.com> wrote: >>> >>>> Are you able to use mpirun to launch a non-MPI application? E.g.: >>>> >>>> mpirun -np 2 hostname >>>> >>>> And if that works, can you run the simple example MPI apps in the >>>> "examples" directory of the MPI source tarball (the "hello world" and >>>> "ring" programs)? E.g.: >>>> >>>> cd examples >>>> make >>>> mpirun -np 4 hello_c >>>> mpirun -np 4 ring_c >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> >>>> ________________________________________ >>>> From: users <users-boun...@lists.open-mpi.org> on behalf of Scott >>>> Sayres via users <users@lists.open-mpi.org> >>>> Sent: Tuesday, May 3, 2022 1:07 PM >>>> To: users@lists.open-mpi.org >>>> Cc: Scott Sayres >>>> Subject: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 >>>> >>>> Hello, >>>> I am new to openmpi, but would like to use it for ORCA calculations, >>>> and plan to run codes on the 10 processors of my macbook pro. I installed >>>> this manually and also through homebrew with similar results. I am able to >>>> compile codes with mpicc and run them as native codes, but everything that >>>> I attempt with mpirun, mpiexec just freezes. I can end the program by >>>> typing 'control C' twice, but it continues to run in the background and >>>> requires me to 'kill <pid>'. >>>> even as simple as 'mpirun uname' freezes >>>> >>>> I have tried one installation by: 'arch -arm64 brew install openmpi ' >>>> and a second by downloading the source file, './configure >>>> --prefix=/usr/local', 'make all', make install >>>> >>>> the commands: 'which mpicc', 'which 'mpirun', etc are able to find them >>>> on the path... it just hangs. >>>> >>>> Can anyone suggest how to fix the problem of the program hanging? >>>> Thanks! >>>> Scott >>>> >>> >>> >>> -- >>> Scott G Sayres >>> Assistant Professor >>> School of Molecular Sciences (formerly Department of Chemistry & >>> Biochemistry) >>> Biodesign Center for Applied Structural Discovery >>> Arizona State University >>> >> > > -- > Scott G Sayres > Assistant Professor > School of Molecular Sciences (formerly Department of Chemistry & > Biochemistry) > Biodesign Center for Applied Structural Discovery > Arizona State University >