Arlin Davis wrote: >Sean Hefty wrote: > > > >>Currently a DREP is only sent in response to a DREQ if a connection >>has been found matching the DREQ, and it is in the proper state. Once >>a DREP is sent, the local connection moves into timewait. Duplicate >>DREQs received while in this state result in re-sending the DREP. >> >>However, it's likely that the local connection will enter and exit >>timewait before the remote side times out a lost DREP and resends a DREQ. >>There are a couple possible solutions to this. One is to increase how >>long a connection remains in timewait, by multiplying its wait time by >>max_cm_retries. This can greatly increase the timewait state before a QP >>can be re-used when CM messages are not lost. >> >>An alternative is to send a DREP in response to a DREQ, even if a local >>connection is not found, which is what this patch does. >> >> >> >> > >Would it be possible to get this fix in rc7? I am consistently seeing >this problem with Intel MPI on a 64 node cluster. > >-arlin > > Aviram? Is there an rc7 and could this get in?
>_______________________________________________ >openib-general mailing list >[email protected] >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
