Pete -
When migrating some of our machines to the production network, going
from memfree mellanox ib cards to older pci-x memfull cards, we came
across this error on the server, we're running debian 2.6.18-5-amd64
for the server, and a powerpc node for client which performed the same
test flawlessly against our memfree cards.


[D 12:39:44.257923] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 starting.
[E 13:53:07.910247] Error: openib_post_sr_rdmaw: ibv_post_send (-1).
[E 13:53:07.934403]     [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(error+0xca)
[0x429d2a]
[E 13:53:07.934447]     [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42b3fa]
[E 13:53:07.934457]     [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x426acf]
[E 13:53:07.934465]     [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x428c35]
[E 13:53:07.934473]     [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(BMI_testcontext+0xea)
[0x42533a]
[E 13:53:07.934481]     [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x43a340]
[E 13:53:07.934490]     [bt] /lib/libpthread.so.0 [0x2ae2c1e60f1a]
[E 13:53:07.934498]     [bt] /lib/libc.so.6(__clone+0x72) [0x2ae2c224e6c2]

Looking at the infiniband driver code, since we dont have a check for
this type of error in pvfs2, and I dont have the ib spec with me right
now, I noticed this error code occurs when:

                if (wq_overflow(&qp->sq, nreq, to_mcq(qp->ibv_qp.send_cq))) {
                        ret = -1;
                        *bad_wr = wr;
                        goto out;
                }

Looks like a WQ overflow?
I suppose we could debug inside pvfs2 and decode the bad_wr structure
to get more useful information for the future, do you know if ib spec
states this as a fatal error?
If its not a fatal error - though it looks fatal to me - can we
attempt to repost the send with a backed-off/smaller sge list?

Thanks,

Kyle



-- 
Kyle Schochenmaier
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to