GitHub user peshopetrov opened a pull request:

    https://github.com/apache/spark/pull/20512

    [SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC 
connections.

    ## What changes were proposed in this pull request?
    
    Make it possible for the master to enable TCP keep alive on the RPC 
connections with clients.
    
    ## How was this patch tested?
    
    Manually tested.
    
    Added the following:
    spark.rpc.io.enableTcpKeepAlive  true
    
    to spark-defaults.conf.
    
    Observed the following:
    # netstat -town | grep 7077
    tcp6       0      0 10.240.3.134:7077       10.240.1.25:42851       
ESTABLISHED keepalive (6736.50/0/0)
    tcp6       0      0 10.240.3.134:44911      10.240.3.134:7077       
ESTABLISHED keepalive (4098.68/0/0)
    tcp6       0      0 10.240.3.134:7077       10.240.3.134:44911      
ESTABLISHED keepalive (4098.68/0/0)
    
    Which proves that the keep alive setting is taking effect.
    
    
    It's currently possible to enable TCP keep alive on the worker / executor, 
but is not possible to configure on the master. It's unclear to me why this 
could be the case. Keep alive is more important for the master to protect it 
against suddenly departing workers / executors, thus I think it's very 
important to have it. Particularly this makes the master resilient in case of 
using preemptible worker VMs in GCE. GCE has the concept of shutdown scripts, 
which it doesn't guarantee to execute. So workers often don't get shutdown 
gracefully and the TCP connections on the master linger as there's nothing to 
close them. Thus the need of enabling keep alive.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/peshopetrov/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20512.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20512
    
----
commit c5e2d98b9e98fd3416a36ab91262260146bf4ac5
Author: Petar Petrov <petar.petrov@...>
Date:   2018-01-23T09:02:41Z

    [SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC 
connections.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to