[GitHub] spark pull request #21451: [SPARK-24296][CORE][WIP] Replicate large blocks a...

squito Thu, 31 May 2018 08:18:57 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21451#discussion_r192136490
  
    --- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/RpcHandler.java
 ---
    @@ -38,15 +38,24 @@
        *
        * This method will not be called in parallel for a single 
TransportClient (i.e., channel).
        *
    +   * The rpc *might* included a data stream in <code>streamData</code> 
(eg. for uploading a large
    +   * amount of data which should not be buffered in memory here).  Any 
errors while handling the
    +   * streamData will lead to failing this entire connection -- all other 
in-flight rpcs will fail.
    +   * If stream data is not null, you *must* call 
<code>streamData.registerStreamCallback</code>
    +   * before this method returns.
    +   *
        * @param client A channel client which enables the handler to make 
requests back to the sender
        *               of this RPC. This will always be the exact same object 
for a particular channel.
        * @param message The serialized bytes of the RPC.
    +   * @param streamData StreamData if there is data which is meant to be 
read via a StreamCallback;
    +   *                   otherwise it is null.
        * @param callback Callback which should be invoked exactly once upon 
success or failure of the
        *                 RPC.
        */
       public abstract void receive(
           TransportClient client,
           ByteBuffer message,
    +      StreamData streamData,
    --- End diff --
    
    yes, there are other ways to do this, but I wanted to leave the old code 
paths as close relatively untouched to minimize the behavior change / risk of 
bugs.  I also think its helpful to clearly separate out a portion that is read 
entirely into memory vs. the streaming portion, it makes it easier to work 
with.  Also InputStream suggests the data is getting pulled instead of pushed.
    
    your earlier approach definitely gave a lot of inspiration for this change. 
 I'm hoping that making it a more isolated change helps us make progress here.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21451: [SPARK-24296][CORE][WIP] Replicate large blocks a...

Reply via email to