GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/9917

    [SPARK-11866] [network] [core] Make sure timed out RPCs are cleaned up.

    This change does a couple of different things to make sure that the 
RpcEnv-level
    code and the network library agree about the status of outstanding RPCs.
    
    For RPCs that do not expect a reply ("RpcEnv.send"), support for one way
    messages (hello CORBA!) was added to the network layer. This is a
    "fire and forget" message that does not require any state to be kept
    by the TransportClient; as a result, the RpcEnv 'Ack' message is not needed
    anymore.
    
    For RPCs that do expect a reply ("RpcEnv.ask"), the network library now
    returns the internal RPC id; if the RpcEnv layer decides to time out the
    RPC before the network layer does, it now asks the TransportClient to
    forget about the RPC, so that if the network-level timeout occurs, the
    client is not killed.
    
    As part of implementing the above, I cleaned up some of the code in the
    netty rpc backend, removing types that were not necessary and factoring
    out some common code. Of interest is a slight change in the exceptions
    when posting messages to a stopped RpcEnv; that's mostly to avoid nasty
    error messages from the local-cluster backend when shutting down, which
    pollutes the terminal output.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-11866

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9917
    
----
commit b00f4f0b118d68c913d64cff68c2db05e1d1e3e5
Author: Marcelo Vanzin <[email protected]>
Date:   2015-11-20T20:42:12Z

    [SPARK-11866] [network] [core] Make sure timed out RPCs are cleaned up.
    
    This change does a couple of different things to make sure that the 
RpcEnv-level
    code and the network library agree about the status of outstanding RPCs.
    
    For RPCs that do not expect a reply ("RpcEnv.send"), support for one way
    messages (hello CORBA!) was added to the network layer. This is a
    "fire and forget" message that does not require any state to be kept
    by the TransportClient; as a result, the RpcEnv 'Ack' message is not needed
    anymore.
    
    For RPCs that do expect a reply ("RpcEnv.ask"), the network library now
    returns the internal RPC id; if the RpcEnv layer decides to time out the
    RPC before the network layer does, it now asks the TransportClient to
    forget about the RPC, so that if the network-level timeout occurs, the
    client is not killed.
    
    As part of implementing the above, I cleaned up some of the code in the
    netty rpc backend, removing types that were not necessary and factoring
    out some common code. Of interest is a slight change in the exceptions
    when posting messages to a stopped RpcEnv; that's mostly to avoid nasty
    error messages from the local-cluster backend when shutting down, which
    pollutes the terminal output.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to