Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs
Hello, I used Brice's workaround and now mpirun works well in all computers ! Thank you all for your help Jorge Le 14/11/2020 à 23:11, Brice Goglin via users a écrit : Hello The hwloc/X11 stuff is caused by OpenMPI using a hwloc that was built with the GL backend enabled (in your case, it's because package libhwloc-plugins is installed). That backend is used for querying the locality of X11 displays running on NVIDIA GPUs (using libxnvctrl). Does running "lstopo" fail/hang too? (it will basically run hwloc without OpenMPI). One workaround should be to set HWLOC_COMPONENTS=-gl in your environment so that this backend is ignored. Recent hwloc releases have a way to avoid some plugins at runtime through the C interface, we should likely blacklist all plugins that are already blacklisted at compile time when OMPI builds its own hwloc. Brice
Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs
Sorry, if I execute mpirun in a *really *bare terminal, without X Server running it works! but with an error message : Invalid MIT-MAGIC-COOKIE-1 key So the problem is related to X, but I have still no solution Jorge Le 14/11/2020 à 12:33, Jorge Silva via users a écrit : Hello, In spite of the delay, I was not able to solve my problem. Thanks to Joseph and Prentice for their interesting suggestions. I uninstalled AppAmor (SElinux is not installed ) as suggested by Prentice but there were no changes, mpirun sttill hangs. The result of gdb stack trace is the following: $ sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | head -n 1) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at ../sysdeps/unix/sysv/linux/connect.c:26 26 ../sysdeps/unix/sysv/linux/connect.c: Aucun fichier ou dossier de ce type. Thread 1 (Thread 0x7f92891f4e80 (LWP 4948)): #0 0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at ../sysdeps/unix/sysv/linux/connect.c:26 #1 0x7f9288fff59d in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1 #2 0x7f9288fffc49 in xcb_connect_to_display_with_auth_info () from /lib/x86_64-linux-gnu/libxcb.so.1 #3 0x7f928906cb7a in _XConnectXCB () from /lib/x86_64-linux-gnu/libX11.so.6 #4 0x7f928905d319 in XOpenDisplay () from /lib/x86_64-linux-gnu/libX11.so.6 #5 0x7f92897de4fb in ?? () from /usr/lib/x86_64-linux-gnu/hwloc/hwloc_gl.so #6 0x7f92893b901e in ?? () from /lib/x86_64-linux-gnu/libhwloc.so.15 #7 0x7f92893c13a0 in hwloc_topology_load () from /lib/x86_64-linux-gnu/libhwloc.so.15 #8 0x7f92896df564 in opal_hwloc_base_get_topology () from /lib/x86_64-linux-gnu/libopen-pal.so.40 #9 0x7f92891da6be in ?? () from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so #10 0x7f92897a22fc in orte_init () from /lib/x86_64-linux-gnu/libopen-rte.so.40 #11 0x7f92897a6c86 in orte_submit_init () from /lib/x86_64-linux-gnu/libopen-rte.so.40 #12 0x55819fd0b3a3 in ?? () #13 0x7f92894480b3 in __libc_start_main (main=0x55819fd0b1c0, argc=1, argv=0x7fff5334fe48, init=, fini=out>, rtld_fini=, stack_end=0x7fff5334fe38) at ../csu/libc-start.c:308 #14 0x55819fd0b1fe in ?? () [Inferior 1 (process 4948) detached] So it seems to be a problem in the connection via libxcb (socket ?) but this is out of my system computer skills.. Is there any authorization needed? As is libX11 at the origin of the call I tried to execute in a bare terminal (ctrl-alt-f2 and via ssh) but the message is the same. I tried to recompile/install the hole package and have the same result. Thank you for your help. Jorge Le 22/10/2020 à 12:16, Joseph Schuchart via users a écrit : Hi Jorge, Can you try to get a stack trace of mpirun using the following command in a separate terminal? sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | head -n 1) Maybe that will give some insight where mpirun is hanging. Cheers, Joseph On 10/21/20 9:58 PM, Jorge SILVA via users wrote: Hello Jeff, The program is not executed, seems waits for something to connect with (why twice ctrl-C ?) jorge@gcp26:~/MPIRUN$ mpirun -np 1 touch /tmp/foo ^C^C jorge@gcp26:~/MPIRUN$ ls -l /tmp/foo ls: impossible d'accéder à '/tmp/foo': Aucun fichier ou dossier de ce type no file is created.. In fact, my question was if are there differences in mpirun usage between these versions.. The mpirun -help gives a different output as expected, but I tried a lot of options without any success. Le 21/10/2020 à 21:16, Jeff Squyres (jsquyres) a écrit : There's huge differences between Open MPI v2.1.1 and v4.0.3 (i.e., years of development effort); it would be very hard to categorize them all; sorry! What happens if you mpirun -np 1 touch /tmp/foo (Yes, you can run non-MPI apps through mpirun) Is /tmp/foo created? (i.e., did the job run, and mpirun is somehow not terminating) On Oct 21, 2020, at 12:22 PM, Jorge SILVA via users mailto:users@lists.open-mpi.org>> wrote: Hello Gus, Thank you for your answer.. Unfortunately my problem is much more basic. I didn't try to run the program in both computers , but just to run something in one computer. I just installed the new OS an openmpi in two different computers, in the standard way, with the same result. For example: In kubuntu20.4.1 LTS with openmpi 4.0.3-0ubuntu jorge@gcp26:~/MPIRUN$ cat hello.f90 print*,"Hello World!" end jorge@gcp26:~/MPIRUN$ mpif90 hello.f90 -o hello jorge@gcp26:~/MPIRUN$ ./hello Hello World! jorge@gcp26:~/MPIRUN$ mpirun -np 1 hello <---here the program hangs with no output ^C^Cjorge@gcp26:~/MPIRUN$ The mpirun task sleeps with no output, and only twice ctrl-C ends the execution : jorge 5540 0.1 0.0 4476
Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs
Hello, In spite of the delay, I was not able to solve my problem. Thanks to Joseph and Prentice for their interesting suggestions. I uninstalled AppAmor (SElinux is not installed ) as suggested by Prentice but there were no changes, mpirun sttill hangs. The result of gdb stack trace is the following: $ sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | head -n 1) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at ../sysdeps/unix/sysv/linux/connect.c:26 26 ../sysdeps/unix/sysv/linux/connect.c: Aucun fichier ou dossier de ce type. Thread 1 (Thread 0x7f92891f4e80 (LWP 4948)): #0 0x7f9289544307 in __libc_connect (fd=9, addr=..., len=16) at ../sysdeps/unix/sysv/linux/connect.c:26 #1 0x7f9288fff59d in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1 #2 0x7f9288fffc49 in xcb_connect_to_display_with_auth_info () from /lib/x86_64-linux-gnu/libxcb.so.1 #3 0x7f928906cb7a in _XConnectXCB () from /lib/x86_64-linux-gnu/libX11.so.6 #4 0x7f928905d319 in XOpenDisplay () from /lib/x86_64-linux-gnu/libX11.so.6 #5 0x7f92897de4fb in ?? () from /usr/lib/x86_64-linux-gnu/hwloc/hwloc_gl.so #6 0x7f92893b901e in ?? () from /lib/x86_64-linux-gnu/libhwloc.so.15 #7 0x7f92893c13a0 in hwloc_topology_load () from /lib/x86_64-linux-gnu/libhwloc.so.15 #8 0x7f92896df564 in opal_hwloc_base_get_topology () from /lib/x86_64-linux-gnu/libopen-pal.so.40 #9 0x7f92891da6be in ?? () from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so #10 0x7f92897a22fc in orte_init () from /lib/x86_64-linux-gnu/libopen-rte.so.40 #11 0x7f92897a6c86 in orte_submit_init () from /lib/x86_64-linux-gnu/libopen-rte.so.40 #12 0x55819fd0b3a3 in ?? () #13 0x7f92894480b3 in __libc_start_main (main=0x55819fd0b1c0, argc=1, argv=0x7fff5334fe48, init=, fini=, rtld_fini=, stack_end=0x7fff5334fe38) at ../csu/libc-start.c:308 #14 0x55819fd0b1fe in ?? () [Inferior 1 (process 4948) detached] So it seems to be a problem in the connection via libxcb (socket ?) but this is out of my system computer skills.. Is there any authorization needed? As is libX11 at the origin of the call I tried to execute in a bare terminal (ctrl-alt-f2 and via ssh) but the message is the same. I tried to recompile/install the hole package and have the same result. Thank you for your help. Jorge Le 22/10/2020 à 12:16, Joseph Schuchart via users a écrit : Hi Jorge, Can you try to get a stack trace of mpirun using the following command in a separate terminal? sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | head -n 1) Maybe that will give some insight where mpirun is hanging. Cheers, Joseph On 10/21/20 9:58 PM, Jorge SILVA via users wrote: Hello Jeff, The program is not executed, seems waits for something to connect with (why twice ctrl-C ?) jorge@gcp26:~/MPIRUN$ mpirun -np 1 touch /tmp/foo ^C^C jorge@gcp26:~/MPIRUN$ ls -l /tmp/foo ls: impossible d'accéder à '/tmp/foo': Aucun fichier ou dossier de ce type no file is created.. In fact, my question was if are there differences in mpirun usage between these versions.. The mpirun -help gives a different output as expected, but I tried a lot of options without any success. Le 21/10/2020 à 21:16, Jeff Squyres (jsquyres) a écrit : There's huge differences between Open MPI v2.1.1 and v4.0.3 (i.e., years of development effort); it would be very hard to categorize them all; sorry! What happens if you mpirun -np 1 touch /tmp/foo (Yes, you can run non-MPI apps through mpirun) Is /tmp/foo created? (i.e., did the job run, and mpirun is somehow not terminating) On Oct 21, 2020, at 12:22 PM, Jorge SILVA via users mailto:users@lists.open-mpi.org>> wrote: Hello Gus, Thank you for your answer.. Unfortunately my problem is much more basic. I didn't try to run the program in both computers , but just to run something in one computer. I just installed the new OS an openmpi in two different computers, in the standard way, with the same result. For example: In kubuntu20.4.1 LTS with openmpi 4.0.3-0ubuntu jorge@gcp26:~/MPIRUN$ cat hello.f90 print*,"Hello World!" end jorge@gcp26:~/MPIRUN$ mpif90 hello.f90 -o hello jorge@gcp26:~/MPIRUN$ ./hello Hello World! jorge@gcp26:~/MPIRUN$ mpirun -np 1 hello <---here the program hangs with no output ^C^Cjorge@gcp26:~/MPIRUN$ The mpirun task sleeps with no output, and only twice ctrl-C ends the execution : jorge 5540 0.1 0.0 44768 8472 pts/8 S+ 17:54 0:00 mpirun -np 1 hello In kubuntu 18.04.5 LTS with openmpi 2.1.1, of course, the same program gives jorge@gcp30:~/MPIRUN$ cat hello.f90 print*, "Hello World!" END jorge@gcp30:~/MPIRUN$ mpif90 hello.f90 -o hello jorge@gcp30:~/MPIRUN$ ./hello H
Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs
Hello Jeff, The program is not executed, seems waits for something to connect with (why twice ctrl-C ?) jorge@gcp26:~/MPIRUN$ mpirun -np 1 touch /tmp/foo ^C^C jorge@gcp26:~/MPIRUN$ ls -l /tmp/foo ls: impossible d'accéder à '/tmp/foo': Aucun fichier ou dossier de ce type no file is created.. In fact, my question was if are there differences in mpirun usage between these versions.. The mpirun -help gives a different output as expected, but I tried a lot of options without any success. Le 21/10/2020 à 21:16, Jeff Squyres (jsquyres) a écrit : There's huge differences between Open MPI v2.1.1 and v4.0.3 (i.e., years of development effort); it would be very hard to categorize them all; sorry! What happens if you mpirun -np 1 touch /tmp/foo (Yes, you can run non-MPI apps through mpirun) Is /tmp/foo created? (i.e., did the job run, and mpirun is somehow not terminating) On Oct 21, 2020, at 12:22 PM, Jorge SILVA via users mailto:users@lists.open-mpi.org>> wrote: Hello Gus, Thank you for your answer.. Unfortunately my problem is much more basic. I didn't try to run the program in both computers , but just to run something in one computer. I just installed the new OS an openmpi in two different computers, in the standard way, with the same result. For example: In kubuntu20.4.1 LTS with openmpi 4.0.3-0ubuntu jorge@gcp26:~/MPIRUN$ cat hello.f90 print*,"Hello World!" end jorge@gcp26:~/MPIRUN$ mpif90 hello.f90 -o hello jorge@gcp26:~/MPIRUN$ ./hello Hello World! jorge@gcp26:~/MPIRUN$ mpirun -np 1 hello <---here the program hangs with no output ^C^Cjorge@gcp26:~/MPIRUN$ The mpirun task sleeps with no output, and only twice ctrl-C ends the execution : jorge 5540 0.1 0.0 44768 8472 pts/8 S+ 17:54 0:00 mpirun -np 1 hello In kubuntu 18.04.5 LTS with openmpi 2.1.1, of course, the same program gives jorge@gcp30:~/MPIRUN$ cat hello.f90 print*, "Hello World!" END jorge@gcp30:~/MPIRUN$ mpif90 hello.f90 -o hello jorge@gcp30:~/MPIRUN$ ./hello Hello World! jorge@gcp30:~/MPIRUN$ mpirun -np 1 hello Hello World jorge@gcp30:~/MPIRUN$ Even just typing mpirun hangs without the usual error message. Are there any changes between the two versions of openmpi that I miss? Some package lacking to mpirun ? Thank you again for your help Jorge Le 21/10/2020 à 00:20, Gus Correa a écrit : Hi Jorge You may have an active firewall protecting either computer or both, and preventing mpirun to start the connection. Your /etc/hosts file may also not have the computer IP addresses. You may also want to try the --hostfile option. Likewise, the --verbose option may also help diagnose the problem. It would help if you send the mpirun command line, the hostfile (if any), error message if any, etc. These FAQs may help diagnose and solve the problem: https://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems https://www.open-mpi.org/faq/?category=running#mpirun-hostfile https://www.open-mpi.org/faq/?category=running I hope this helps, Gus Correa On Tue, Oct 20, 2020 at 4:47 PM Jorge SILVA via users mailto:users@lists.open-mpi.org>> wrote: Hello, I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different computers in the standard way. Compiling with mpif90 works, but mpirun hangs with no output in both systems. Even mpirun command without parameters hangs and only twice ctrl-C typing can end the sleeping program. Only the command mpirun --help gives the usual output. Seems that is something related to the terminal output, but the command worked well for Kubuntu 18.04. Is there a way to debug or fix this problem (without re-compiling from sources, etc)? Is it a known problem? Thanks, Jorge -- Jeff Squyres jsquy...@cisco.com <mailto:jsquy...@cisco.com>
Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs
Hello Gus, Thank you for your answer.. Unfortunately my problem is much more basic. I didn't try to run the program in both computers , but just to run something in one computer. I just installed the new OS an openmpi in two different computers, in the standard way, with the same result. For example: In kubuntu20.4.1 LTS with openmpi 4.0.3-0ubuntu jorge@gcp26:~/MPIRUN$ cat hello.f90 print*,"Hello World!" end jorge@gcp26:~/MPIRUN$ mpif90 hello.f90 -o hello jorge@gcp26:~/MPIRUN$ ./hello Hello World! jorge@gcp26:~/MPIRUN$ mpirun -np 1 hello <---here the program hangs with no output ^C^Cjorge@gcp26:~/MPIRUN$ The mpirun task sleeps with no output, and only twice ctrl-C ends the execution : jorge 5540 0.1 0.0 44768 8472 pts/8 S+ 17:54 0:00 mpirun -np 1 hello In kubuntu 18.04.5 LTS with openmpi 2.1.1, of course, the same program gives jorge@gcp30:~/MPIRUN$ cat hello.f90 print*, "Hello World!" END jorge@gcp30:~/MPIRUN$ mpif90 hello.f90 -o hello jorge@gcp30:~/MPIRUN$ ./hello Hello World! jorge@gcp30:~/MPIRUN$ mpirun -np 1 hello Hello World jorge@gcp30:~/MPIRUN$ Even just typing mpirun hangs without the usual error message. Are there any changes between the two versions of openmpi that I miss? Some package lacking to mpirun ? Thank you again for your help Jorge Le 21/10/2020 à 00:20, Gus Correa a écrit : Hi Jorge You may have an active firewall protecting either computer or both, and preventing mpirun to start the connection. Your /etc/hosts file may also not have the computer IP addresses. You may also want to try the --hostfile option. Likewise, the --verbose option may also help diagnose the problem. It would help if you send the mpirun command line, the hostfile (if any), error message if any, etc. These FAQs may help diagnose and solve the problem: https://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems https://www.open-mpi.org/faq/?category=running#mpirun-hostfile https://www.open-mpi.org/faq/?category=running I hope this helps, Gus Correa On Tue, Oct 20, 2020 at 4:47 PM Jorge SILVA via users mailto:users@lists.open-mpi.org>> wrote: Hello, I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different computers in the standard way. Compiling with mpif90 works, but mpirun hangs with no output in both systems. Even mpirun command without parameters hangs and only twice ctrl-C typing can end the sleeping program. Only the command mpirun --help gives the usual output. Seems that is something related to the terminal output, but the command worked well for Kubuntu 18.04. Is there a way to debug or fix this problem (without re-compiling from sources, etc)? Is it a known problem? Thanks, Jorge
[OMPI users] mpirun on Kubuntu 20.4.1 hangs
Hello, I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different computers in the standard way. Compiling with mpif90 works, but mpirun hangs with no output in both systems. Even mpirun command without parameters hangs and only twice ctrl-C typing can end the sleeping program. Only the command mpirun --help gives the usual output. Seems that is something related to the terminal output, but the command worked well for Kubuntu 18.04. Is there a way to debug or fix this problem (without re-compiling from sources, etc)? Is it a known problem? Thanks, Jorge