Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21346#discussion_r192795662
  
    --- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/RpcHandler.java
 ---
    @@ -38,15 +38,24 @@
        *
        * This method will not be called in parallel for a single 
TransportClient (i.e., channel).
        *
    +   * The rpc *might* included a data stream in <code>streamData</code> 
(eg. for uploading a large
    +   * amount of data which should not be buffered in memory here).  Any 
errors while handling the
    +   * streamData will lead to failing this entire connection -- all other 
in-flight rpcs will fail.
    --- End diff --
    
    you bring up a good point here.  I was thinking about how the places we 
might have an error occur:
    
    1) while reading the stream data (ie. StreamCallback.onData).  In the 
intended use case, this is basically just opening a file and writing bytes to 
it.
    
    2) post-processing the complete data (StreamCallback.onComplete).  This is 
doing the whole BlockManager.put, which can be rather complex.
    
    Failures in (1) are unlikely and are difficult to recover; failures in (2) 
are more likely, but the channel should be totally fine.  I've updated the 
code, comments,  and test to make sure things are OK for (2).  
https://github.com/apache/spark/pull/21346/commits/6c086c51873c72fa0cf9f373afd069ac63de3b75
    
    though your points are still valid for (1), though I think we can live with 
it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to