Re: [OMPI devel] [OMPI svn] svn:open-mpi r21548

2009-07-01 Thread George Bosilca
On Wed, 1 Jul 2009, Ralph Castain wrote: Okay, let me know. I'll test some more here. Problem fixed. Thanks, george. Thanks again for catching it. Ralph Thanks, george. On Wed, 1 Jul 2009, Ralph Castain wrote: Believe this is now fixed with r21582 - let me know if it now wor

Re: [OMPI devel] [OMPI svn] svn:open-mpi r21548

2009-07-01 Thread Ralph Castain
On Jul 1, 2009, at 3:28 PM, George Bosilca wrote: I think I know why it didn't cause problems with SLURM and TORQUE. The routing was wrong, so the message was at one point forwarded to the HNP. As the HNP has direct connections with all other processes, it was able to correctly deliver the

Re: [OMPI devel] [OMPI svn] svn:open-mpi r21548

2009-07-01 Thread George Bosilca
I think I know why it didn't cause problems with SLURM and TORQUE. The routing was wrong, so the message was at one point forwarded to the HNP. As the HNP has direct connections with all other processes, it was able to correctly deliver the message. The only visible impact was 2 more jumps in f

Re: [OMPI devel] [OMPI svn] svn:open-mpi r21548

2009-07-01 Thread Ralph Castain
Believe this is now fixed with r21582 - let me know if it now works for you. Sorry for the problem. It was indeed miscounting the number of daemons in the system, though apparently this wasn't causing problems for slurm and torque (still investigating why since it should have). Unfortunately, just

Re: [OMPI devel] [OMPI svn] svn:open-mpi r21548

2009-07-01 Thread Ralph Castain
Hmmm...I'll take a look. It seems to be working for me under Torque and SLURM, though I cannot vouch for the tree launch. The problem with letting the index start at 0 is it breaks other things, so I'll have to see about fixing the routing schemes, or find some compromise. Thanks for the heads up.

Re: [OMPI devel] [OMPI svn] svn:open-mpi r21548

2009-07-01 Thread George Bosilca
Ralph, This commit break several components in OMPI, mainly the routing schemes and the tree launch. The part with the problem is the reduction of the number of declared daemons on the second part of the commit, where you change the boundary for the for loop from 0 to 1. As a result the number