Re: [O-MPI devel] broken rmgr?
I think this is fixed now. There was a bug in the poll ops that was uncovered yesterday when I fixed a deadlock issue in the event library. Brian On Aug 4, 2005, at 7:19 AM, Jeff Squyres wrote: I can confirm this -- running a simple MPI "hello world" on one node with the rsh pls, the MPI processes finish and exit, but then orterun hangs: (gdb) bt #0 0xb7e8ef88 in poll () from /lib/libc.so.6 #1 0xb7f4f8a5 in poll_dispatch (arg=0xb7f6f080, tv=0xbfffe4f8) at poll.c:196 #2 0xb7f4d72b in opal_event_loop (flags=1) at event.c:515 #3 0xb7f5ac6e in opal_progress () at opal_progress.c:211 #4 0xb7d6fca1 in opal_condition_wait (c=0xb7d7242c, m=0xb7d72418) at condition.h:72 #5 0xb7d6f7f0 in orte_pls_rsh_finalize () at pls_rsh_module.c:833 #6 0xb7fb3ab6 in orte_pls_base_finalize () at pls_base_close.c:40 #7 0xb7d9092f in orte_rmgr_urm_finalize () at rmgr_urm.c:336 #8 0xb7fc14f7 in orte_rmgr_base_close () at rmgr_base_close.c:33 #9 0xb7fd3563 in orte_system_finalize () at orte_system_finalize.c:61 #10 0xb7fceca5 in orte_finalize () at orte_finalize.c:36 #11 0x0804a0d9 in main (argc=4, argv=0xbfffe6d4) at orterun.c:390 Am investigating... On Aug 3, 2005, at 10:55 PM, Ralph H. Castain wrote: Hmmm...it was running for me last night and (I thought) this morning, but I'll test it again and see if I can reproduce the problem. Could be something crept in there. At 06:28 PM 8/3/2005, you wrote: I just noticed that mpirun hangs forever inside the orte_rmgr.finalize() routine. AFAIK this is new to today, and confirmed on PPC64, x86-64, x86-32. Don't have the immediate time, at the moment, to dig deeper, but thought I would throw that out there. Cheers, Josh Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [O-MPI devel] broken rmgr?
I can confirm this -- running a simple MPI "hello world" on one node with the rsh pls, the MPI processes finish and exit, but then orterun hangs: (gdb) bt #0 0xb7e8ef88 in poll () from /lib/libc.so.6 #1 0xb7f4f8a5 in poll_dispatch (arg=0xb7f6f080, tv=0xbfffe4f8) at poll.c:196 #2 0xb7f4d72b in opal_event_loop (flags=1) at event.c:515 #3 0xb7f5ac6e in opal_progress () at opal_progress.c:211 #4 0xb7d6fca1 in opal_condition_wait (c=0xb7d7242c, m=0xb7d72418) at condition.h:72 #5 0xb7d6f7f0 in orte_pls_rsh_finalize () at pls_rsh_module.c:833 #6 0xb7fb3ab6 in orte_pls_base_finalize () at pls_base_close.c:40 #7 0xb7d9092f in orte_rmgr_urm_finalize () at rmgr_urm.c:336 #8 0xb7fc14f7 in orte_rmgr_base_close () at rmgr_base_close.c:33 #9 0xb7fd3563 in orte_system_finalize () at orte_system_finalize.c:61 #10 0xb7fceca5 in orte_finalize () at orte_finalize.c:36 #11 0x0804a0d9 in main (argc=4, argv=0xbfffe6d4) at orterun.c:390 Am investigating... On Aug 3, 2005, at 10:55 PM, Ralph H. Castain wrote: Hmmm...it was running for me last night and (I thought) this morning, but I'll test it again and see if I can reproduce the problem. Could be something crept in there. At 06:28 PM 8/3/2005, you wrote: I just noticed that mpirun hangs forever inside the orte_rmgr.finalize() routine. AFAIK this is new to today, and confirmed on PPC64, x86-64, x86-32. Don't have the immediate time, at the moment, to dig deeper, but thought I would throw that out there. Cheers, Josh Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] broken rmgr?
Hmmm...it was running for me last night and (I thought) this morning, but I'll test it again and see if I can reproduce the problem. Could be something crept in there. At 06:28 PM 8/3/2005, you wrote: I just noticed that mpirun hangs forever inside the orte_rmgr.finalize() routine. AFAIK this is new to today, and confirmed on PPC64, x86-64, x86-32. Don't have the immediate time, at the moment, to dig deeper, but thought I would throw that out there. Cheers, Josh Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel