>1) what larger set of application problems does this patch address? >>> For example, for a short lived connection, it was observed that >>> a REP mad completed with status canceled. This is normal. However, >>> the user already attempted to disconnect the connection by sending >>> a DREQ. This left the cep in the DREQ_SENT state by the time that >>> the REP mad completed. Since the REP status was not success, but the >>> state was DREQ_SENT, the code assumed that the DREQ had failed and >>> transitioned the cep into TIMEWAIT. The result is that the DREQ is >>> never matched with a DREP or canceled, but holds a reference on the >>> CEP. >>> >>> Until the DREQ times out (time depends on connection, but easily >>> up to a minute), attempts to destroy the CEP are blocked.
>2) what type/degree of testing has been successfully passed with this patch >applied? It passes using the test that I used to discover and diagnose the problem, which is ndconn. I also ran with all other ND tests, several dapl tests, librdmacm samples, and Intel MPI. _______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
