Hi Kevin Are you getting those messages from ompi_info? Or from an MPI app (and if so, what are you doing to get them)?
On Sep 11, 2011, at 5:25 PM, kevin.buck...@ecs.vuw.ac.nz wrote: > I have recently seen some OpenIB time out errors and see the > following reported: > > * btl_openib_ib_retry_count - The number of times the sender will > attempt to retry (defaulted to 7, the maximum value). > * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted > to 10). The actual timeout value used is calculated as: > > I'd like to confirm that, when those messages say "defaulted to", > they are telling me what's happening on the node in question and > not just what the default is. > > Reason for asking is that I believe that I am setting the values of > btl_openib_ib_timeout to 20, globally, as suggested in areas of the > OpenMPI docs but those messages, if they do report what's happening, > might be telling me otherwise. > > In case it is relevant, the OpenMPI in question is the bog standard > RHEL5 1.4.4. > > -- > Kevin M. Buckley Room: CO327 > School of Engineering and Phone: +64 4 463 5971 > Computer Science > Victoria University of Wellington > New Zealand > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users