We faced several problems when trying to send large files over IPoIB (each side copied large file shared on NFS) Problem #1- ib_post_send returned an "IB_INVALID_PARAMETER" results Problem #2 - IPoIB disappeared on one of the machines during the copy
Problem #2 was a consequence of #1, because: 1. ib_post_send returned IB_INVALID_PARAMETER and "hung" flag was set as a consequence of this operation 2. NDIS realized that IPoIB got stuck and sent restart command. But it was a race in this flow. There are 3 patches that fix these problem: Patch #1: __ipoib_reset_adapter is called in a separate thread from ipoib_adapter_reset and it changes the value of p_adapter->ipoib_state. On the other hand, ipoib_adapter_reset calls to shutter_shut and also checks and changes ipoib_state. Thus, the possible race (that happened) is that __ipoib_reset_adapter will start running before call to shutter_shut. Patch #2: When IPoIB has to send an NBL with number of SG elements greater than HW can handle at one send, it switches to 'send_copy' flow. But send_gen always return non-success status in this case (caused by CM flow commit) Patch #3: Invalid max SGE calculation caused ib_post_send to fail Alexander (XaleX) Naslednikov SW Networking Team Mellanox Technologies _______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
