Changqing> Is there a common recommended value for this timeout ?
    Changqing> I use 18, which represents 1 second.

18 should be OK I guess, unless you have congestion in your fabric, in
which case you have other problems anyway.

    Changqing> It is very hard to reproduce this error with standalone
    Changqing> code. I use HP-Mpi and need 8 ranks, at least 4 nodes
    Changqing> with 2 cards on each node, and just one of our hundred
    Changqing> test code can catch this error, and it is on
    Changqing> MPI_Scatterv Operation.

Unless you can narrow down a way to reproduce this, I don't think it's
going to be possible for anyone to help debug it.

 - R.

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to