That backtrace seems to imply that the launch may not have completed.

Can you make an executable script foo.sh with:


#!/bin/bash


i=0

while test $i -lt 10; do

    date

    sleep 1

    let i=$i+1

done

Make sure that foo.sh is executable and then run it via:

mpirun -np 1 foo.sh

If you start seeing output, good!    If it completes, better!

If it hangs, and/or if you don't see any output at all, do this:


ps auxwww | egrep 'mpirun|foo.sh'

It should show mpirun and 2 copies of foo.sh (and probably a grep).  Does it?

--
Jeff Squyres
jsquy...@cisco.com

________________________________________
From: Scott Sayres <ssay...@asu.edu>
Sent: Wednesday, May 4, 2022 2:47 PM
To: Open MPI Users
Cc: Jeff Squyres (jsquyres)
Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

Following Jeff's advice, I have rebuilt open-mpi by hand using the -g option.   
This shows more information as below.   I am attempting George's advice of how 
to track the child but notice that gdb does not support arm64.  attempting to 
update lldb.


scottsayres@scotts-mbp openmpi-4.1.3 % lldb mpirun -- -np 1 hostname

(lldb) target create "mpirun"

Current executable set to 'mpirun' (arm64).

(lldb) settings set -- target.run-args  "-np" "1" "hostname"

(lldb) run

Process 90950 launched: '/usr/local/bin/mpirun' (arm64)

Process 90950 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

    frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8

libsystem_kernel.dylib`read:

->  0x1bde25654 <+8>:  b.lo   0x1bde25674               ; <+40>

    0x1bde25658 <+12>: pacibsp

    0x1bde2565c <+16>: stp    x29, x30, [sp, #-0x10]!

    0x1bde25660 <+20>: mov    x29, sp

Target 0: (mpirun) stopped.

(lldb) ^C

(lldb) thread backtrace

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP

  * frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8

    frame #1: 0x000000010056169c libopen-pal.40.dylib`opal_fd_read(fd=27, 
len=20, buffer=0x000000016fdfe90c) at fd.c:51:14

    frame #2: 0x00000001027b3388 
mca_odls_default.so`do_parent(cd=0x0000600003e00000, read_fd=27) at 
odls_default_module.c:495:14

    frame #3: 0x00000001027b2d90 
mca_odls_default.so`odls_default_fork_local_proc(cdptr=0x0000600003e00000) at 
odls_default_module.c:651:12

    frame #4: 0x00000001003246f8 
libopen-rte.40.dylib`orte_odls_base_spawn_proc(fd=-1, sd=4, 
cbdata=0x0000600003e00000) at odls_base_default_fns.c:1046:31

    frame #5: 0x000000010057a7a0 
libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined] 
event_process_active_single_queue(base=0x00000001007061c0) at event.c:1370:4 
[opt]

    frame #6: 0x000000010057a628 
libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined] 
event_process_active(base=0x00000001007061c0) at event.c:1440:8 [opt]

    frame #7: 0x000000010057a5ec 
libopen-pal.40.dylib`opal_libevent2022_event_base_loop(base=0x00000001007061c0, 
flags=<unavailable>) at event.c:1644:12 [opt]

    frame #8: 0x0000000100003b04 mpirun`orterun(argc=4, 
argv=0x000000016fdff268) at orterun.c:179:9

    frame #9: 0x0000000100003904 mpirun`main(argc=4, argv=0x000000016fdff268) 
at main.c:13:12

    frame #10: 0x0000000100015088 dyld`start + 516


Reply via email to