At Mon, 03 Sep 2012 23:07:34 +0900, MORITA Kazutaka wrote: > > At Mon, 03 Sep 2012 21:30:09 +0800, > Liu Yuan wrote: > > > > On 09/03/2012 08:24 PM, MORITA Kazutaka wrote: > > > No. The reason I doubt keepalive is that, when the trouble happens, > > > the scripts takes 15 minutes always. I just guess the connection is > > > closed with another timeout, but I'm not sure. So, I wrote 'perhaps'. > > > > > >> > > > >> > I am not sure, but I think current keepalive implementation looks okay > > >> > to me, it is simple > > >> > and efficient. I have tested with various situation besides this > > >> > script. If there is any > > >> > problem inside the code, I'd like to fix the bug instead of running > > >> > away completely from it. > > > Okay, but in future, it would be considerable to remove TCP keepalive. > > > The check of node availability is the work of cluster driver. > > > > All the hangs is suspected to use RTO instead of keepalive timer. Could you > > please tell me where > > the thread is hung at? > > It waits for a response from the unreachable node at poll() in > wait_forward_request(). I'm not sure why it returns after keepalive
s/it returns/it doesn't return/ Thanks, Kazutaka -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog