Just hack up anything you like to get it to work. If it fixes the situation, we'll go back and clean up the code later.It is optimistic, what you're trying to do, but I'm not sure if it will be sufficient. If there are no credits to get back from checking the CQ, you'll just deadlock. I'm also nervous about locking implications, as you're checking the CQ in the thread that is trying to do the send. Not sure if we have done this before. A simpler way would be just to just fail whatever operation got us into this RDMA, by abandoning it, with another state that says we're waiting on credits. An easier first step is just to add lots of printfs to track the credits and see if you can correlate a credit overflow with the rdma failures. If that works, a check at the top of "post rdma" can say whether we should even bother and we won't need your fixup step of looking at the CQ from the send. -- Pete
This seems to work a little better.. http://www.scl.ameslab.gov/~troy/pvfs/ibv_post_send/retry-ibv_post_send.diff and gives output like this: http://www.scl.ameslab.gov/~troy/pvfs/ibv_post_send/pvfs2-server-ib-da1.log http://www.scl.ameslab.gov/~troy/pvfs/ibv_post_send/pvfs2-server-ib-da3.log _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
