Alright, great, I'm making some progress,

I did a simple copy/paste modification and recompiled mesos. The keepalive 
timer is set from slave to master so this is an improvement for me. I didn't 
test the other direction yet - 
https://gist.github.com/jolexa/ee9e152aa7045c558e02 - I'd like to file an 
enhancement request for this since it seems like an improvement for other 
people as well, after some real world testing


I'm having some harder time figuring out the zk client code. I started by 
modifying build/3rdparty/zookeeper-3.4.5/src/c/zookeeper.c but either a) my 
change wasn't correct or b) I'm modifying a wrong file, since I just assumed 
using the c client. Is this the correct place?


Thanks much,

Jeremy


________________________________
From: Jojy Varghese <j...@mesosphere.io>
Sent: Monday, November 9, 2015 2:09 PM
To: user@mesos.apache.org
Subject: Re: Mesos and Zookeeper TCP keepalive

Hi Jeremy
 The "network" code is at "3rdparty/libprocess/include/process/network.hpp" , 
"3rdparty/libprocess/src/poll_socket.hpp/cpp".

thanks
jojy


On Nov 9, 2015, at 6:54 AM, Jeremy Olexa 
<jol...@spscommerce.com<mailto:jol...@spscommerce.com>> wrote:

Hi all,

Jojy, That is correct, but more specifically a keepalive timer from slave to 
master and slave to zookeeper. Can you send a link to the portion of the code 
that builds the socket/connection? Is there any reason to not set the 
SO_KEEPALIVE option in your opinion?

hasodent, I'm not looking for keepalive between zk quorum members, like the 
ZOOKEEPER JIRA is referencing.

Thanks,
Jeremy


________________________________
From: Jojy Varghese <j...@mesosphere.io<mailto:j...@mesosphere.io>>
Sent: Sunday, November 8, 2015 8:37 PM
To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: Re: Mesos and Zookeeper TCP keepalive

Hi Jeremy
  Are you trying to establish a keepalive timer between mesos master and mesos 
slave? If so, I don't believe its possible today as SO_KEEPALIVE option is  not 
set on an accepting socket.

-Jojy

On Nov 8, 2015, at 8:43 AM, haosdent 
<haosd...@gmail.com<mailto:haosd...@gmail.com>> wrote:

I think keepalive option should be set in Zookeeper, not in Mesos. See this 
related issue in Zookeeper. 
https://issues.apache.org/jira/browse/ZOOKEEPER-2246?focusedCommentId=14724085&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14724085

On Sun, Nov 8, 2015 at 4:47 AM, Jeremy Olexa 
<jol...@spscommerce.com<mailto:jol...@spscommerce.com>> wrote:
Hello all,

We have been fighting some network/session disconnection issues between 
datacenters and I'm curious if there is anyway to enable tcp keepalive on the 
zookeeper/mesos sockets? If there was a way, then the sysctl tcp kernel 
settings would be used. I believe keepalive has to be enabled by the software 
which is opening the connection. (That is my understanding anyway)

Here is what I see via netstat --timers -tn:
tcp        0      0 172.18.1.1:55842<http://172.18.1.1:55842/>      
10.10.1.1:2181<http://10.10.1.1:2181/>      ESTABLISHED off (0.00/0/0)
tcp        0      0 172.18.1.1:49702      10.10.1.1:5050      ESTABLISHED off 
(0.00/0/0)


Where 172 is the mesos-slave network and 10 is the mesos-master network. The 
"off" keyword means that keepalive's are not being sent.

I've trolled through JIRA, git, etc and cannot easily determine if this is 
expected behavior or should be an enhancement request. Any ideas?

Thanks much!
-Jeremy




--
Best Regards,
Haosdent Huang

Reply via email to