On Wed, 1 Jul 2009, Ralph Castain wrote:
Okay, let me know. I'll test some more here.
Problem fixed.
Thanks,
george.
Thanks again for catching it.
Ralph
Thanks,
george.
On Wed, 1 Jul 2009, Ralph Castain wrote:
Believe this is now fixed with r21582 - let me know if it now wor
On Jul 1, 2009, at 3:28 PM, George Bosilca wrote:
I think I know why it didn't cause problems with SLURM and TORQUE.
The routing was wrong, so the message was at one point forwarded to
the HNP. As the HNP has direct connections with all other processes,
it was able to correctly deliver the
I think I know why it didn't cause problems with SLURM and TORQUE. The
routing was wrong, so the message was at one point forwarded to the HNP.
As the HNP has direct connections with all other processes, it was able to
correctly deliver the message. The only visible impact was 2 more jumps in
f
Believe this is now fixed with r21582 - let me know if it now works for you.
Sorry for the problem. It was indeed miscounting the number of daemons in
the system, though apparently this wasn't causing problems for slurm and
torque (still investigating why since it should have). Unfortunately, just
Hmmm...I'll take a look. It seems to be working for me under Torque and
SLURM, though I cannot vouch for the tree launch. The problem with letting
the index start at 0 is it breaks other things, so I'll have to see about
fixing the routing schemes, or find some compromise.
Thanks for the heads up.
Ralph,
This commit break several components in OMPI, mainly the routing schemes
and the tree launch. The part with the problem is the reduction of the
number of declared daemons on the second part of the commit, where you
change the boundary for the for loop from 0 to 1. As a result the number