Or Gerlitz wrote: > My understanding is that without this patch the side that sends the DREQ > would do few DREQ resends as of the "firsts" DREPs being lost and no > DREPs sent once the id at the peer side left the timewait state, correct?
This is correct. Note that the number of DREQ retries was changed to 15 now. > Can you please share what were the implications with intel MPI running a > 64 nodes (128 ranks?) job? was the issue here just making the ***job > termination time*** bigger? The job termination time was taking about a minute waiting for the DREQ to timeout. When running a series of tests, this becomes a fairly large issue. - Sean _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
