Re: [O-MPI devel] broken rmgr?

2005-08-04 Thread Brian Barrett
I think this is fixed now.  There was a bug in the poll ops that was  
uncovered yesterday when I fixed a deadlock issue in the event library.


Brian

On Aug 4, 2005, at 7:19 AM, Jeff Squyres wrote:


I can confirm this -- running a simple MPI "hello world" on one node
with the rsh pls, the MPI processes finish and exit, but then orterun
hangs:

(gdb) bt
#0  0xb7e8ef88 in poll () from /lib/libc.so.6
#1  0xb7f4f8a5 in poll_dispatch (arg=0xb7f6f080, tv=0xbfffe4f8) at
poll.c:196
#2  0xb7f4d72b in opal_event_loop (flags=1) at event.c:515
#3  0xb7f5ac6e in opal_progress () at opal_progress.c:211
#4  0xb7d6fca1 in opal_condition_wait (c=0xb7d7242c, m=0xb7d72418)
 at condition.h:72
#5  0xb7d6f7f0 in orte_pls_rsh_finalize () at pls_rsh_module.c:833
#6  0xb7fb3ab6 in orte_pls_base_finalize () at pls_base_close.c:40
#7  0xb7d9092f in orte_rmgr_urm_finalize () at rmgr_urm.c:336
#8  0xb7fc14f7 in orte_rmgr_base_close () at rmgr_base_close.c:33
#9  0xb7fd3563 in orte_system_finalize () at orte_system_finalize.c:61
#10 0xb7fceca5 in orte_finalize () at orte_finalize.c:36
#11 0x0804a0d9 in main (argc=4, argv=0xbfffe6d4) at orterun.c:390

Am investigating...




On Aug 3, 2005, at 10:55 PM, Ralph H. Castain wrote:



Hmmm...it was running for me last night and (I thought) this morning,
but I'll test it again and see if I can reproduce the problem. Could
be something crept in there.


At 06:28 PM 8/3/2005, you wrote:


I just noticed that mpirun hangs forever inside the
orte_rmgr.finalize() routine. AFAIK this is new to today, and
confirmed
on PPC64, x86-64, x86-32.

Don't have the immediate time, at the moment, to dig deeper, but
thought I would throw that out there.

Cheers,
Josh


Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [O-MPI devel] broken rmgr?

2005-08-04 Thread Jeff Squyres
I can confirm this -- running a simple MPI "hello world" on one node 
with the rsh pls, the MPI processes finish and exit, but then orterun 
hangs:


(gdb) bt
#0  0xb7e8ef88 in poll () from /lib/libc.so.6
#1  0xb7f4f8a5 in poll_dispatch (arg=0xb7f6f080, tv=0xbfffe4f8) at 
poll.c:196

#2  0xb7f4d72b in opal_event_loop (flags=1) at event.c:515
#3  0xb7f5ac6e in opal_progress () at opal_progress.c:211
#4  0xb7d6fca1 in opal_condition_wait (c=0xb7d7242c, m=0xb7d72418)
at condition.h:72
#5  0xb7d6f7f0 in orte_pls_rsh_finalize () at pls_rsh_module.c:833
#6  0xb7fb3ab6 in orte_pls_base_finalize () at pls_base_close.c:40
#7  0xb7d9092f in orte_rmgr_urm_finalize () at rmgr_urm.c:336
#8  0xb7fc14f7 in orte_rmgr_base_close () at rmgr_base_close.c:33
#9  0xb7fd3563 in orte_system_finalize () at orte_system_finalize.c:61
#10 0xb7fceca5 in orte_finalize () at orte_finalize.c:36
#11 0x0804a0d9 in main (argc=4, argv=0xbfffe6d4) at orterun.c:390

Am investigating...




On Aug 3, 2005, at 10:55 PM, Ralph H. Castain wrote:


Hmmm...it was running for me last night and (I thought) this morning,
but I'll test it again and see if I can reproduce the problem. Could
be something crept in there.


At 06:28 PM 8/3/2005, you wrote:

I just noticed that mpirun hangs forever inside the
orte_rmgr.finalize() routine. AFAIK this is new to today, and 
confirmed

on PPC64, x86-64, x86-32.

Don't have the immediate time, at the moment, to dig deeper, but
thought I would throw that out there.

Cheers,
Josh


Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] broken rmgr?

2005-08-03 Thread Ralph H. Castain
Hmmm...it was running for me last night and (I thought) this morning, 
but I'll test it again and see if I can reproduce the problem. Could 
be something crept in there.



At 06:28 PM 8/3/2005, you wrote:

I just noticed that mpirun hangs forever inside the
orte_rmgr.finalize() routine. AFAIK this is new to today, and confirmed
on PPC64, x86-64, x86-32.

Don't have the immediate time, at the moment, to dig deeper, but
thought I would throw that out there.

Cheers,
Josh


Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel