That is weird, but maybe it is not a deadlock, but a very slow progress. In the child can you print the fdmax and i in the frame do_child.
George. On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users < users@lists.open-mpi.org> wrote: > Jeff, thanks. > from 1: > > (lldb) process attach --pid 95083 > > Process 95083 stopped > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP > > frame #0: 0x00000001bde25628 libsystem_kernel.dylib`close + 8 > > libsystem_kernel.dylib`close: > > -> 0x1bde25628 <+8>: b.lo 0x1bde25648 ; <+40> > > 0x1bde2562c <+12>: pacibsp > > 0x1bde25630 <+16>: stp x29, x30, [sp, #-0x10]! > > 0x1bde25634 <+20>: mov x29, sp > > Target 0: (orterun) stopped. > > Executable module set to "/usr/local/bin/orterun". > > Architecture set to: arm64e-apple-macosx-. > > (lldb) thread backtrace > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP > > * frame #0: 0x00000001bde25628 libsystem_kernel.dylib`close + 8 > > frame #1: 0x0000000101563074 > mca_odls_default.so`do_child(cd=0x0000600001e28000, write_fd=40) at > odls_default_module.c:410:17 > > frame #2: 0x0000000101562d7c > mca_odls_default.so`odls_default_fork_local_proc(cdptr=0x0000600001e28000) > at odls_default_module.c:646:9 > > frame #3: 0x0000000100e2c6f8 > libopen-rte.40.dylib`orte_odls_base_spawn_proc(fd=-1, sd=4, > cbdata=0x0000600001e28000) at odls_base_default_fns.c:1046:31 > > frame #4: 0x00000001011827a0 > libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined] > event_process_active_single_queue(base=0x000000010df069d0) at event.c:1370 > :4 [opt] > > frame #5: 0x0000000101182628 > libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined] > event_process_active(base=0x000000010df069d0) at event.c:1440:8 [opt] > > frame #6: 0x00000001011825ec > libopen-pal.40.dylib`opal_libevent2022_event_base_loop(base=0x000000010df069d0, > flags=<unavailable>) at event.c:1644:12 [opt] > > frame #7: 0x0000000100bbfb04 orterun`orterun(argc=4, > argv=0x000000016f2432f8) at orterun.c:179:9 > > frame #8: 0x0000000100bbf904 orterun`main(argc=4, > argv=0x000000016f2432f8) at main.c:13:12 > > frame #9: 0x0000000100f19088 dyld`start + 516 > > from 2: > > scottsayres@scotts-mbp ~ % lldb -p 95082 > > (lldb) process attach --pid 95082 > > Process 95082 stopped > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP > > frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8 > > libsystem_kernel.dylib`read: > > -> 0x1bde25654 <+8>: b.lo 0x1bde25674 ; <+40> > > 0x1bde25658 <+12>: pacibsp > > 0x1bde2565c <+16>: stp x29, x30, [sp, #-0x10]! > > 0x1bde25660 <+20>: mov x29, sp > > Target 0: (orterun) stopped. > > Executable module set to "/usr/local/bin/orterun". > > Architecture set to: arm64e-apple-macosx-. > > (lldb) thread backtrace > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP > > * frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8 > > frame #1: 0x000000010116969c libopen-pal.40.dylib`opal_fd_read(fd=22, > len=20, buffer=0x000000016f24299c) at fd.c:51:14 > > frame #2: 0x0000000101563388 > mca_odls_default.so`do_parent(cd=0x0000600001e28200, read_fd=22) at > odls_default_module.c:495:14 > > frame #3: 0x0000000101562d90 > mca_odls_default.so`odls_default_fork_local_proc(cdptr=0x0000600001e28200) > at odls_default_module.c:651:12 > > frame #4: 0x0000000100e2c6f8 > libopen-rte.40.dylib`orte_odls_base_spawn_proc(fd=-1, sd=4, > cbdata=0x0000600001e28200) at odls_base_default_fns.c:1046:31 > > frame #5: 0x00000001011827a0 > libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined] > event_process_active_single_queue(base=0x000000010df069d0) at event.c:1370 > :4 [opt] > > frame #6: 0x0000000101182628 > libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined] > event_process_active(base=0x000000010df069d0) at event.c:1440:8 [opt] > > frame #7: 0x00000001011825ec > libopen-pal.40.dylib`opal_libevent2022_event_base_loop(base=0x000000010df069d0, > flags=<unavailable>) at event.c:1644:12 [opt] > > frame #8: 0x0000000100bbfb04 orterun`orterun(argc=4, > argv=0x000000016f2432f8) at orterun.c:179:9 > > frame #9: 0x0000000100bbf904 orterun`main(argc=4, > argv=0x000000016f2432f8) at main.c:13:12 > > frame #10: 0x0000000100f19088 dyld`start + 516 > >