>
>'retries exceeded' means that the transport retry count was 
>exceeded, so most likely your timeout is set too low.

Is there a common recommended value for this timeout ? I use 18, which
represents 1 second.

>
>Without seeing your code, I couldn't begin to say why you 
>don't see a send completion.  If you are absolutely positive 
>that you post a send and you never see a completion for that 
>send, then I guess it is a firmware or hardware problem.

It is very hard to reproduce this error with standalone code. I use
HP-Mpi and need 8 ranks, at least 4 nodes with 
2 cards on each node, and just one of our hundred test code can catch
this error, and it is on MPI_Scatterv
Operation.

--CQ


>
> - R.
>

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to