Sean Hefty wrote:

>Currently a DREP is only sent in response to a DREQ if a connection
>has been found matching the DREQ, and it is in the proper state.  Once
>a DREP is sent, the local connection moves into timewait.  Duplicate
>DREQs received while in this state result in re-sending the DREP.
>
>However, it's likely that the local connection will enter and exit
>timewait before the remote side times out a lost DREP and resends a DREQ.
>There are a couple possible solutions to this.  One is to increase how
>long a connection remains in timewait, by multiplying its wait time by
>max_cm_retries.  This can greatly increase the timewait state before a QP
>can be re-used when CM messages are not lost.
>
>An alternative is to send a DREP in response to a DREQ, even if a local
>connection is not found, which is what this patch does.
>  
>

Would it be possible to get this fix in  rc7? I am consistently seeing 
this problem with Intel MPI on a 64 node cluster.

-arlin

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to