I agree so much that I just recently filed a bug about this same issue:
https://svn.open-mpi.org/trac/ompi/ticket/1338
Thanks for the feedback, though -- this turns it from a hypothetical
issue into a "it has happened to at least one user" issue...
On Jun 20, 2008, at 8:00 PM, Scott Atchley wrote:
Hi all,
We had a customer using 1.2.6 with MX. We were running his jobs,
some of which used the MX BTL and some used the MX MTL.
He added a few more nodes to the cluster and installed the same
OMPI. When we tried to run jobs that spanned the new nodes, the jobs
failed. I did not keep the error messages, but it seems to be a
standard message about a component such as "self" not found.
The problem in fact was that he installed OMPI, but for some reason
neither the MX BTL nor the MX MTL were installed. Thus, the failure.
I do not believe the error message for the BTL runs ever
specifically mentioned a missing MX component even though we were
setting "--mca btl self,sm,mx" (we did not specify MX when using the
MTL, we only used "--mca pml cm".
It would be helpful in the case where a OMPI cannot run _and_ a
module is specifically requested but not available to be mentioned
in the error message.
Thanks,
Scott
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems