GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/9917
[SPARK-11866] [network] [core] Make sure timed out RPCs are cleaned up.
This change does a couple of different things to make sure that the
RpcEnv-level
code and the network library agree about the status of outstanding RPCs.
For RPCs that do not expect a reply ("RpcEnv.send"), support for one way
messages (hello CORBA!) was added to the network layer. This is a
"fire and forget" message that does not require any state to be kept
by the TransportClient; as a result, the RpcEnv 'Ack' message is not needed
anymore.
For RPCs that do expect a reply ("RpcEnv.ask"), the network library now
returns the internal RPC id; if the RpcEnv layer decides to time out the
RPC before the network layer does, it now asks the TransportClient to
forget about the RPC, so that if the network-level timeout occurs, the
client is not killed.
As part of implementing the above, I cleaned up some of the code in the
netty rpc backend, removing types that were not necessary and factoring
out some common code. Of interest is a slight change in the exceptions
when posting messages to a stopped RpcEnv; that's mostly to avoid nasty
error messages from the local-cluster backend when shutting down, which
pollutes the terminal output.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-11866
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9917.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9917
----
commit b00f4f0b118d68c913d64cff68c2db05e1d1e3e5
Author: Marcelo Vanzin <[email protected]>
Date: 2015-11-20T20:42:12Z
[SPARK-11866] [network] [core] Make sure timed out RPCs are cleaned up.
This change does a couple of different things to make sure that the
RpcEnv-level
code and the network library agree about the status of outstanding RPCs.
For RPCs that do not expect a reply ("RpcEnv.send"), support for one way
messages (hello CORBA!) was added to the network layer. This is a
"fire and forget" message that does not require any state to be kept
by the TransportClient; as a result, the RpcEnv 'Ack' message is not needed
anymore.
For RPCs that do expect a reply ("RpcEnv.ask"), the network library now
returns the internal RPC id; if the RpcEnv layer decides to time out the
RPC before the network layer does, it now asks the TransportClient to
forget about the RPC, so that if the network-level timeout occurs, the
client is not killed.
As part of implementing the above, I cleaned up some of the code in the
netty rpc backend, removing types that were not necessary and factoring
out some common code. Of interest is a slight change in the exceptions
when posting messages to a stopped RpcEnv; that's mostly to avoid nasty
error messages from the local-cluster backend when shutting down, which
pollutes the terminal output.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]