[Pvfs2-developers] Re: bmi_ib failure on memfull HCA's

Troy Benjegerdes Tue, 19 Feb 2008 14:59:57 -0800

Here's a little bit more info..

[E 02/19 17:00] max send/recv sge 29 30

[E 02/19 17:01] job_time_mgr_expire: job time out: cancelling flowoperation, job_id: 2437.

[E 02/19 17:01] fp_multiqueue_cancel: flow proto cancel called on 0x6216c0

[E 02/19 17:01] handle_io_error: flow proto error cleanup started on0x6216c0, error_code: -1610613121


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47602573279216 (LWP 4049)]

memcache_deregister (md=<value optimized out>, buflist=0x664ff8) at../src/io/bmi/bmi_ib/mem.c:317

317             --c->count;
(gdb)
(gdb)
(gdb)
(gdb)
(gdb) list
312
313         gen_mutex_lock(&memcache_device->mutex);
314         for (i=0; i<buflist->num; i++) {
315     #if ENABLE_MEMCACHE
316             memcache_entry_t *c = buflist->memcache[i];
317             --c->count;
318             debug(2,

319 "%s: dec refcount [%d] %p len %lld (via %p len %lld)refcnt now %d",

320                __func__, i, buflist->buf.send[i], lld(buflist->len[i]),
321                c->buf, lld(c->len), c->count);


Troy Benjegerdes wrote:

Now this is interesting.. This is the memfree hardware, but it lookslike the 'bmi_ib: cancel memcache deregister states' fix didn't quitedo the right thing.
da0:/scratch/1# tail -f pvfs2-server.log
[D 02/19 16:25] PVFS2 Server version 2.7.1pre1-2008-02-19-171553starting.
[E 02/19 16:25] max send/recv sge 29 30
[E 02/19 16:25] max send/recv sge 29 30
[E 02/19 16:26] max send/recv sge 29 30
[E 02/19 16:27] max send/recv sge 29 30
[E 02/19 16:30] job_time_mgr_expire: job time out: cancelling flowoperation, job_id: 9572.[E 02/19 16:30] fp_multiqueue_cancel: flow proto cancel called on0x66cbf0[E 02/19 16:30] handle_io_error: flow proto error cleanup started on0x66cbf0, error_code: -1610613121[E 02/19 16:30] PVFS2 server: signal 11, faulty address is 0x20, from0x429efa[E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a)[0x429efa][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a)[0x429efa][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x4295a7][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_thread_mgr_bmi_cancel+0xa6)[0x43aac6][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42ddb0][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e035][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e4e7][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_flow_cancel+0x42)[0x42d4d2][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_flow_cancel+0x3d)[0x4388cd][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_time_mgr_expire+0x1ca)[0x43b18a][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x45b339][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_invoke+0xb5)[0x446f35][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_next+0xab)[0x4471cb][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_continue+0x1e)[0x446dee][E 02/19 16:30] [bt]/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(main+0xe00)[0x410b90]
Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Tue, 19 Feb 2008 14:53 -0600:
When migrating some of our machines to the production network, going
from memfree mellanox ib cards to older pci-x memfull cards, we came
across this error on the server, we're running debian 2.6.18-5-amd64
for the server, and a powerpc node for client which performed the same
test flawlessly against our memfree cards.
[D 12:39:44.257923] PVFS2 Server version 2.7.1pre1-2008-02-19-171553starting.
[E 13:53:07.910247] Error: openib_post_sr_rdmaw: ibv_post_send (-1).
[..]
Looking at the infiniband driver code, since we dont have a check for
this type of error in pvfs2, and I dont have the ib spec with me right
now, I noticed this error code occurs when:
if (wq_overflow(&qp->sq, nreq,to_mcq(qp->ibv_qp.send_cq))) {
                        ret = -1;
                        *bad_wr = wr;
                        goto out;
                }

Looks like a WQ overflow?
I suppose we could debug inside pvfs2 and decode the bad_wr structure
to get more useful information for the future, do you know if ib spec
states this as a fatal error?
If its not a fatal error - though it looks fatal to me - can we
attempt to repost the send with a backed-off/smaller sge list?
Sounds like a WQ overflow to me too.  This is in the RDMA code where
it is necessary to do multiple RDMAs to satisfy one request,
sometimes.  There is not any global checking against the number of
available WQs.  I'm not sure why this hasn't come up before.  It
would be possible to phase this with an extra couple of states.
Just not a very fun thing to have to do!

I'm curious what you have for SG support in that hardware.  Can
you add some printf around here to show the max_send_sge and
max_recv_sge.  I added an example line; maybe it will work:

    /* compare the caps that came back against what we already have */
    gossip_err("max send/recv sge %d %d\n", att.cap.max_send_sge,
               att.cap.max_recv_sge);
    if (od->sg_max_len == 0) {
        od->sg_max_len = att.cap.max_send_sge;
        if (att.cap.max_recv_sge < od->sg_max_len)
            od->sg_max_len = att.cap.max_recv_sge;
od->sg_tmp_array = Malloc(od->sg_max_len *sizeof(*od->sg_tmp_array));
    } else {
        if (att.cap.max_send_sge < od->sg_max_len)
error("%s: new conn has smaller send SG array size %d vs%d",
                  __func__, att.cap.max_send_sge, od->sg_max_len);
        if (att.cap.max_recv_sge < od->sg_max_len)
error("%s: new conn has smaller recv SG array size %d vs%d",
                  __func__, att.cap.max_recv_sge, od->sg_max_len);
    }


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

[Pvfs2-developers] Re: bmi_ib failure on memfull HCA's

Reply via email to