Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21451#discussion_r192136490
--- Diff:
common/network-common/src/main/java/org/apache/spark/network/server/RpcHandler.java
---
@@ -38,15 +38,24 @@
*
* This method will not be called in parallel for a single
TransportClient (i.e., channel).
*
+ * The rpc *might* included a data stream in <code>streamData</code>
(eg. for uploading a large
+ * amount of data which should not be buffered in memory here). Any
errors while handling the
+ * streamData will lead to failing this entire connection -- all other
in-flight rpcs will fail.
+ * If stream data is not null, you *must* call
<code>streamData.registerStreamCallback</code>
+ * before this method returns.
+ *
* @param client A channel client which enables the handler to make
requests back to the sender
* of this RPC. This will always be the exact same object
for a particular channel.
* @param message The serialized bytes of the RPC.
+ * @param streamData StreamData if there is data which is meant to be
read via a StreamCallback;
+ * otherwise it is null.
* @param callback Callback which should be invoked exactly once upon
success or failure of the
* RPC.
*/
public abstract void receive(
TransportClient client,
ByteBuffer message,
+ StreamData streamData,
--- End diff --
yes, there are other ways to do this, but I wanted to leave the old code
paths as close relatively untouched to minimize the behavior change / risk of
bugs. I also think its helpful to clearly separate out a portion that is read
entirely into memory vs. the streaming portion, it makes it easier to work
with. Also InputStream suggests the data is getting pulled instead of pushed.
your earlier approach definitely gave a lot of inspiration for this change.
I'm hoping that making it a more isolated change helps us make progress here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]