Pete -

I've been trying to debug some issues with my MD server going down, or rather timing out and closing the connections for some reason, and canceling bmi jobs. While doing so, I ran into a segfaulting issue in openib_close_connection:

static void openib_close_connection(ib_connection_t *c)
{
   int ret;
   struct openib_connection_priv *oc = c->priv;

   /* destroy the queue pairs */

<snip>

   free(oc);
}

Since my gdb backtrace doesnt go into any ibv_* functions, I'm assuming this free() call is the culprit. I'm not sure why this free() could be getting into a segfault, but I'm thinking it may be a good idea for now until we can work out why it's closing the connections, to put a check in there to make sure oc is still valid.

Has anyone run into this or other issues with servers going down in openib?

   -- Kyle


--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to