We faced several problems when trying to send large files over IPoIB (each side 
copied large file shared on NFS)
Problem #1- ib_post_send returned an "IB_INVALID_PARAMETER" results
Problem #2 - IPoIB disappeared on one of the machines during the copy

Problem #2 was a consequence of #1, because:
1. ib_post_send returned IB_INVALID_PARAMETER and "hung" flag was set as a 
consequence of this operation 
2. NDIS realized that IPoIB got stuck and sent restart command. But it was a 
race in this flow.

There are 3 patches that fix these problem:
Patch #1:
__ipoib_reset_adapter is called in a separate thread from ipoib_adapter_reset 
and it changes the value of p_adapter->ipoib_state. On the other hand, 
ipoib_adapter_reset calls to shutter_shut and also checks and changes 
ipoib_state.
Thus, the possible race (that happened) is that __ipoib_reset_adapter will 
start running before call to shutter_shut.

Patch #2:
When IPoIB has to send an NBL with number of SG elements greater than HW can 
handle at one send, it switches to 'send_copy' flow. But send_gen always return 
non-success status in this case (caused by CM flow commit)

Patch #3:
Invalid max SGE calculation caused ib_post_send to fail

Alexander (XaleX) Naslednikov
SW Networking Team
Mellanox Technologies

_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

Reply via email to