That is weird, but maybe it is not a deadlock, but a very slow progress. In
the child can you print the fdmax and i in the frame do_child.

George.

On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users <
users@lists.open-mpi.org> wrote:

> Jeff, thanks.
> from 1:
>
> (lldb) process attach --pid 95083
>
> Process 95083 stopped
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
>     frame #0: 0x00000001bde25628 libsystem_kernel.dylib`close + 8
>
> libsystem_kernel.dylib`close:
>
> ->  0x1bde25628 <+8>:  b.lo   0x1bde25648               ; <+40>
>
>     0x1bde2562c <+12>: pacibsp
>
>     0x1bde25630 <+16>: stp    x29, x30, [sp, #-0x10]!
>
>     0x1bde25634 <+20>: mov    x29, sp
>
> Target 0: (orterun) stopped.
>
> Executable module set to "/usr/local/bin/orterun".
>
> Architecture set to: arm64e-apple-macosx-.
>
> (lldb) thread backtrace
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
>   * frame #0: 0x00000001bde25628 libsystem_kernel.dylib`close + 8
>
>     frame #1: 0x0000000101563074
> mca_odls_default.so`do_child(cd=0x0000600001e28000, write_fd=40) at
> odls_default_module.c:410:17
>
>     frame #2: 0x0000000101562d7c
> mca_odls_default.so`odls_default_fork_local_proc(cdptr=0x0000600001e28000)
> at odls_default_module.c:646:9
>
>     frame #3: 0x0000000100e2c6f8
> libopen-rte.40.dylib`orte_odls_base_spawn_proc(fd=-1, sd=4,
> cbdata=0x0000600001e28000) at odls_base_default_fns.c:1046:31
>
>     frame #4: 0x00000001011827a0
> libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined]
> event_process_active_single_queue(base=0x000000010df069d0) at event.c:1370
> :4 [opt]
>
>     frame #5: 0x0000000101182628
> libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined]
> event_process_active(base=0x000000010df069d0) at event.c:1440:8 [opt]
>
>     frame #6: 0x00000001011825ec
> libopen-pal.40.dylib`opal_libevent2022_event_base_loop(base=0x000000010df069d0,
> flags=<unavailable>) at event.c:1644:12 [opt]
>
>     frame #7: 0x0000000100bbfb04 orterun`orterun(argc=4,
> argv=0x000000016f2432f8) at orterun.c:179:9
>
>     frame #8: 0x0000000100bbf904 orterun`main(argc=4,
> argv=0x000000016f2432f8) at main.c:13:12
>
>     frame #9: 0x0000000100f19088 dyld`start + 516
>
> from 2:
>
> scottsayres@scotts-mbp ~ % lldb -p 95082
>
> (lldb) process attach --pid 95082
>
> Process 95082 stopped
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
>     frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8
>
> libsystem_kernel.dylib`read:
>
> ->  0x1bde25654 <+8>:  b.lo   0x1bde25674               ; <+40>
>
>     0x1bde25658 <+12>: pacibsp
>
>     0x1bde2565c <+16>: stp    x29, x30, [sp, #-0x10]!
>
>     0x1bde25660 <+20>: mov    x29, sp
>
> Target 0: (orterun) stopped.
>
> Executable module set to "/usr/local/bin/orterun".
>
> Architecture set to: arm64e-apple-macosx-.
>
> (lldb) thread backtrace
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
>   * frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8
>
>     frame #1: 0x000000010116969c libopen-pal.40.dylib`opal_fd_read(fd=22,
> len=20, buffer=0x000000016f24299c) at fd.c:51:14
>
>     frame #2: 0x0000000101563388
> mca_odls_default.so`do_parent(cd=0x0000600001e28200, read_fd=22) at
> odls_default_module.c:495:14
>
>     frame #3: 0x0000000101562d90
> mca_odls_default.so`odls_default_fork_local_proc(cdptr=0x0000600001e28200)
> at odls_default_module.c:651:12
>
>     frame #4: 0x0000000100e2c6f8
> libopen-rte.40.dylib`orte_odls_base_spawn_proc(fd=-1, sd=4,
> cbdata=0x0000600001e28200) at odls_base_default_fns.c:1046:31
>
>     frame #5: 0x00000001011827a0
> libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined]
> event_process_active_single_queue(base=0x000000010df069d0) at event.c:1370
> :4 [opt]
>
>     frame #6: 0x0000000101182628
> libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined]
> event_process_active(base=0x000000010df069d0) at event.c:1440:8 [opt]
>
>     frame #7: 0x00000001011825ec
> libopen-pal.40.dylib`opal_libevent2022_event_base_loop(base=0x000000010df069d0,
> flags=<unavailable>) at event.c:1644:12 [opt]
>
>     frame #8: 0x0000000100bbfb04 orterun`orterun(argc=4,
> argv=0x000000016f2432f8) at orterun.c:179:9
>
>     frame #9: 0x0000000100bbf904 orterun`main(argc=4,
> argv=0x000000016f2432f8) at main.c:13:12
>
>     frame #10: 0x0000000100f19088 dyld`start + 516
>
>

Reply via email to