Pete -
I've been trying to debug some issues with my MD server going down, or
rather timing out and closing the connections for some reason, and
canceling bmi jobs. While doing so, I ran into a segfaulting issue in
openib_close_connection:
static void openib_close_connection(ib_connection_t *c)
{
int ret;
struct openib_connection_priv *oc = c->priv;
/* destroy the queue pairs */
<snip>
free(oc);
}
Since my gdb backtrace doesnt go into any ibv_* functions, I'm assuming
this free() call is the culprit.
I'm not sure why this free() could be getting into a segfault, but I'm
thinking it may be a good idea for now until we can work out why it's
closing the connections, to put a check in there to make sure oc is
still valid.
Has anyone run into this or other issues with servers going down in openib?
-- Kyle
--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers