Scott,

I am afraid this test is inconclusive since stdout is processed by mpirun.

What if you
mpirun -np 1 touch /tmp/xyz

abort (since it will likely hang) and
ls -l /tmp/xyz

In my experience on mac, this kind of hangs can happen if you are running a
firewall and/or the IP of your host does not match the hostname


Cheers,

Gilles

On Thu, May 5, 2022 at 5:06 AM Scott Sayres via users <
users@lists.open-mpi.org> wrote:

> foo.sh is executable, again hangs without output.
> I command c x2 to return to shell, then
>
> ps auxwww | egrep 'mpirun|foo.sh'
> output shown below
>
> scottsayres@scotts-mbp trouble-shoot % ./foo.sh
>
> Wed May  4 12:59:15 MST 2022
>
> Wed May  4 12:59:16 MST 2022
>
> Wed May  4 12:59:17 MST 2022
>
> Wed May  4 12:59:18 MST 2022
>
> Wed May  4 12:59:19 MST 2022
>
> Wed May  4 12:59:20 MST 2022
>
> Wed May  4 12:59:21 MST 2022
>
> Wed May  4 12:59:22 MST 2022
>
> Wed May  4 12:59:23 MST 2022
>
> Wed May  4 12:59:24 MST 2022
>
> scottsayres@scotts-mbp trouble-shoot % mpirun -np 1 foo.sh
>
> ^C^C*%*
>
>               scottsayres@scotts-mbp trouble-shoot % ps auxwww | egrep
> 'mpirun|foo.sh'
>
> scottsayres      91795 100.0  0.0 409067920   1456 s002  R    12:59PM
> 0:14.07 mpirun -np 1 foo.sh
>
> scottsayres      91798   0.0  0.0 408628368   1632 s002  S+    1:00PM
> 0:00.00 egrep mpirun|foo.sh
>
> scottsayres@scotts-mbp trouble-shoot %
>
>
> On Wed, May 4, 2022 at 12:42 PM Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> That backtrace seems to imply that the launch may not have completed.
>>
>> Can you make an executable script foo.sh with:
>>
>> #!/bin/bash
>>
>>
>> i=0
>>
>> while test $i -lt 10; do
>>
>>     date
>>
>>     sleep 1
>>
>>     let i=$i+1
>>
>> done
>>
>>
>> Make sure that foo.sh is executable and then run it via:
>>
>> mpirun -np 1 foo.sh
>>
>> If you start seeing output, good!    If it completes, better!
>>
>> If it hangs, and/or if you don't see any output at all, do this:
>>
>> ps auxwww | egrep 'mpirun|foo.sh'
>>
>> It should show mpirun and 2 copies of foo.sh (and probably a grep).  Does
>> it?
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
>> ________________________________________
>> From: Scott Sayres <ssay...@asu.edu>
>> Sent: Wednesday, May 4, 2022 2:47 PM
>> To: Open MPI Users
>> Cc: Jeff Squyres (jsquyres)
>> Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3
>>
>> Following Jeff's advice, I have rebuilt open-mpi by hand using the -g
>> option.   This shows more information as below.   I am attempting George's
>> advice of how to track the child but notice that gdb does not support
>> arm64.  attempting to update lldb.
>>
>>
>> scottsayres@scotts-mbp openmpi-4.1.3 % lldb mpirun -- -np 1 hostname
>>
>> (lldb) target create "mpirun"
>>
>> Current executable set to 'mpirun' (arm64).
>>
>> (lldb) settings set -- target.run-args  "-np" "1" "hostname"
>>
>> (lldb) run
>>
>> Process 90950 launched: '/usr/local/bin/mpirun' (arm64)
>>
>> Process 90950 stopped
>>
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>>
>>     frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8
>>
>> libsystem_kernel.dylib`read:
>>
>> ->  0x1bde25654 <+8>:  b.lo   0x1bde25674               ; <+40>
>>
>>     0x1bde25658 <+12>: pacibsp
>>
>>     0x1bde2565c <+16>: stp    x29, x30, [sp, #-0x10]!
>>
>>     0x1bde25660 <+20>: mov    x29, sp
>>
>> Target 0: (mpirun) stopped.
>>
>> (lldb) ^C
>>
>> (lldb) thread backtrace
>>
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>>
>>   * frame #0: 0x00000001bde25654 libsystem_kernel.dylib`read + 8
>>
>>     frame #1: 0x000000010056169c libopen-pal.40.dylib`opal_fd_read(fd=27,
>> len=20, buffer=0x000000016fdfe90c) at fd.c:51:14
>>
>>     frame #2: 0x00000001027b3388
>> mca_odls_default.so`do_parent(cd=0x0000600003e00000, read_fd=27) at
>> odls_default_module.c:495:14
>>
>>     frame #3: 0x00000001027b2d90
>> mca_odls_default.so`odls_default_fork_local_proc(cdptr=0x0000600003e00000)
>> at odls_default_module.c:651:12
>>
>>     frame #4: 0x00000001003246f8
>> libopen-rte.40.dylib`orte_odls_base_spawn_proc(fd=-1, sd=4,
>> cbdata=0x0000600003e00000) at odls_base_default_fns.c:1046:31
>>
>>     frame #5: 0x000000010057a7a0
>> libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined]
>> event_process_active_single_queue(base=0x00000001007061c0) at
>> event.c:1370:4 [opt]
>>
>>     frame #6: 0x000000010057a628
>> libopen-pal.40.dylib`opal_libevent2022_event_base_loop [inlined]
>> event_process_active(base=0x00000001007061c0) at event.c:1440:8 [opt]
>>
>>     frame #7: 0x000000010057a5ec
>> libopen-pal.40.dylib`opal_libevent2022_event_base_loop(base=0x00000001007061c0,
>> flags=<unavailable>) at event.c:1644:12 [opt]
>>
>>     frame #8: 0x0000000100003b04 mpirun`orterun(argc=4,
>> argv=0x000000016fdff268) at orterun.c:179:9
>>
>>     frame #9: 0x0000000100003904 mpirun`main(argc=4,
>> argv=0x000000016fdff268) at main.c:13:12
>>
>>     frame #10: 0x0000000100015088 dyld`start + 516
>>
>>
>>
>
> --
> Scott G Sayres
> Assistant Professor
> School of Molecular Sciences (formerly Department of Chemistry &
> Biochemistry)
> Biodesign Center for Applied Structural Discovery
> Arizona State University
>

Reply via email to