A better patch which I hope will apply to CVSHEAD is at:

http://www.scl.ameslab.gov/~troy/pvfs/ibv_post_send/retry-ibv_post_send-cvshead.patch

This also includes several other minor changes I have been dragging along in my tree.

Just hack up anything you like to get it to work.  If it fixes the
situation, we'll go back and clean up the code later.

It is optimistic, what you're trying to do, but I'm not sure if it
will be sufficient.  If there are no credits to get back from
checking the CQ, you'll just deadlock.  I'm also nervous about
locking implications, as you're checking the CQ in the thread that
is trying to do the send.  Not sure if we have done this before.

A simpler way would be just to just fail whatever operation got us
into this RDMA, by abandoning it, with another state that says we're
waiting on credits.  An easier first step is just to add lots of
printfs to track the credits and see if you can correlate a credit
overflow with the rdma failures.  If that works, a check at the top
of "post rdma" can say whether we should even bother and we won't
need your fixup step of looking at the CQ from the send.

                -- Pete

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to