GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/7653

    [SPARK-9328] [WIP] [BRANCH-1.2] Add read timeouts to Netty IO layer

    Spark's Netty-based network layer does not implement read timeouts which 
may lead to stalls during shuffle: if a remote shuffle server stalls while 
responding to a shuffle block fetch request but does not close the socket then 
the job may block until an OS-level socket timeout occurs.
    
    I think that we can fix this using Netty's ReadTimeoutHandler 
(http://stackoverflow.com/questions/13390363/netty-connecttimeoutmillis-vs-readtimeouthandler).
 The tricky part of working on this will be figuring out the right place to add 
the handler and ensuring that we don't introduce performance issues by not 
re-using sockets.
    
    Quoting from that linked StackOverflow question:
    
    > Note that the ReadTimeoutHandler is also unaware of whether you have sent 
a request - it only cares whether data has been read from the socket. If your 
connection is persistent, and you only want read timeouts to fire when a 
request has been sent, you'll need to build a request / response aware timeout 
handler.
    
    If we want to avoid tearing down connections between shuffles then we may 
have to do something like this.
    
    This WIP pull request exists to discuss approaches to implementing this 
type of timeout.  I have opened it against `branch-1.2` because I'm trying to 
target a backport patch for a 1.2.x system. Once I've fixed this for branch-1.2 
I will forward-port an updated version of this patch to newer releases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-9328-branch-1.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7653
    
----
commit 04da09c085764349ca618eb543627570da3775fe
Author: Josh Rosen <[email protected]>
Date:   2015-07-24T22:31:59Z

    WIP attempt at implementing socket read timeouts (SPARK-9328)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to