Here's a little bit more info..
[E 02/19 17:00] max send/recv sge 29 30
[E 02/19 17:01] job_time_mgr_expire: job time out: cancelling flow
operation, job_id: 2437.
[E 02/19 17:01] fp_multiqueue_cancel: flow proto cancel called on 0x6216c0
[E 02/19 17:01] handle_io_error: flow proto error cleanup started on
0x6216c0, error_code: -1610613121
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47602573279216 (LWP 4049)]
memcache_deregister (md=<value optimized out>, buflist=0x664ff8) at
../src/io/bmi/bmi_ib/mem.c:317
317 --c->count;
(gdb)
(gdb)
(gdb)
(gdb)
(gdb) list
312
313 gen_mutex_lock(&memcache_device->mutex);
314 for (i=0; i<buflist->num; i++) {
315 #if ENABLE_MEMCACHE
316 memcache_entry_t *c = buflist->memcache[i];
317 --c->count;
318 debug(2,
319 "%s: dec refcount [%d] %p len %lld (via %p len %lld)
refcnt now %d",
320 __func__, i, buflist->buf.send[i], lld(buflist->len[i]),
321 c->buf, lld(c->len), c->count);
Troy Benjegerdes wrote:
Now this is interesting.. This is the memfree hardware, but it looks
like the 'bmi_ib: cancel memcache deregister states' fix didn't quite
do the right thing.
da0:/scratch/1# tail -f pvfs2-server.log
[D 02/19 16:25] PVFS2 Server version 2.7.1pre1-2008-02-19-171553
starting.
[E 02/19 16:25] max send/recv sge 29 30
[E 02/19 16:25] max send/recv sge 29 30
[E 02/19 16:26] max send/recv sge 29 30
[E 02/19 16:27] max send/recv sge 29 30
[E 02/19 16:30] job_time_mgr_expire: job time out: cancelling flow
operation, job_id: 9572.
[E 02/19 16:30] fp_multiqueue_cancel: flow proto cancel called on
0x66cbf0
[E 02/19 16:30] handle_io_error: flow proto error cleanup started on
0x66cbf0, error_code: -1610613121
[E 02/19 16:30] PVFS2 server: signal 11, faulty address is 0x20, from
0x429efa
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a)
[0x429efa]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a)
[0x429efa]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x4295a7]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_thread_mgr_bmi_cancel+0xa6)
[0x43aac6]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42ddb0]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e035]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e4e7]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_flow_cancel+0x42)
[0x42d4d2]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_flow_cancel+0x3d)
[0x4388cd]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_time_mgr_expire+0x1ca)
[0x43b18a]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x45b339]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_invoke+0xb5)
[0x446f35]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_next+0xab)
[0x4471cb]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_continue+0x1e)
[0x446dee]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(main+0xe00)
[0x410b90]
Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Tue, 19 Feb 2008 14:53 -0600:
When migrating some of our machines to the production network, going
from memfree mellanox ib cards to older pci-x memfull cards, we came
across this error on the server, we're running debian 2.6.18-5-amd64
for the server, and a powerpc node for client which performed the same
test flawlessly against our memfree cards.
[D 12:39:44.257923] PVFS2 Server version 2.7.1pre1-2008-02-19-171553
starting.
[E 13:53:07.910247] Error: openib_post_sr_rdmaw: ibv_post_send (-1).
[..]
Looking at the infiniband driver code, since we dont have a check for
this type of error in pvfs2, and I dont have the ib spec with me right
now, I noticed this error code occurs when:
if (wq_overflow(&qp->sq, nreq,
to_mcq(qp->ibv_qp.send_cq))) {
ret = -1;
*bad_wr = wr;
goto out;
}
Looks like a WQ overflow?
I suppose we could debug inside pvfs2 and decode the bad_wr structure
to get more useful information for the future, do you know if ib spec
states this as a fatal error?
If its not a fatal error - though it looks fatal to me - can we
attempt to repost the send with a backed-off/smaller sge list?
Sounds like a WQ overflow to me too. This is in the RDMA code where
it is necessary to do multiple RDMAs to satisfy one request,
sometimes. There is not any global checking against the number of
available WQs. I'm not sure why this hasn't come up before. It
would be possible to phase this with an extra couple of states.
Just not a very fun thing to have to do!
I'm curious what you have for SG support in that hardware. Can
you add some printf around here to show the max_send_sge and
max_recv_sge. I added an example line; maybe it will work:
/* compare the caps that came back against what we already have */
gossip_err("max send/recv sge %d %d\n", att.cap.max_send_sge,
att.cap.max_recv_sge);
if (od->sg_max_len == 0) {
od->sg_max_len = att.cap.max_send_sge;
if (att.cap.max_recv_sge < od->sg_max_len)
od->sg_max_len = att.cap.max_recv_sge;
od->sg_tmp_array = Malloc(od->sg_max_len *
sizeof(*od->sg_tmp_array));
} else {
if (att.cap.max_send_sge < od->sg_max_len)
error("%s: new conn has smaller send SG array size %d vs
%d",
__func__, att.cap.max_send_sge, od->sg_max_len);
if (att.cap.max_recv_sge < od->sg_max_len)
error("%s: new conn has smaller recv SG array size %d vs
%d",
__func__, att.cap.max_recv_sge, od->sg_max_len);
}
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers