[GitHub] [spark] Victsm commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks

GitBox Fri, 02 Oct 2020 10:52:16 -0700


Victsm commented on a change in pull request #29855:
URL: https://github.com/apache/spark/pull/29855#discussion_r498965319




##########
File path: 
common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java
##########
@@ -181,6 +182,17 @@ public void onFailure(Throwable e) {
   private void processStreamUpload(final UploadStream req) {
     assert (req.body() == null);
     try {
+      // Retain the original metadata buffer, since it will be used during the 
invocation of
+      // this method. Will be released later.
+      req.meta.retain();
+      // Make a copy of the original metadata buffer. In benchmark, we noticed 
that
+      // we cannot respond the original metadata buffer back to the client, 
otherwise
+      // in cases where multiple concurrent shuffles are present, a wrong 
metadata might
+      // be sent back to client. This is related to the eager release of the 
metadata buffer,
+      // i.e., we always release the original buffer by the time the 
invocation of this
+      // method ends, instead of by the time we respond it to the client. This 
is necessary,
+      // otherwise we start seeing memory issues very quickly in benchmarks.
+      ByteBuffer meta = cloneBuffer(req.meta.nioByteBuffer());

Review comment:
       For the `req.meta` issue, my understanding is the following:
   `processStreamUpload` is only responsible for creating a a 
`StreamCallbackWithID` to be added into the FrameDecoder as a stream 
interceptor.
   The Netty ByteBuf `req.meta` will be released by the time this method exits.
   However, the stream callback would need to respond `req.meta` after this 
method exits.
   Accessing the value of the Netty ByteBuf after it's released is what's 
causing the issue mentioned in the comment.
   I tried to delay the release of `req.meta` until the stream callback 
finishes processing the stream, however that can lead to memory issues on the 
executor side when there are many blocks to be transferred.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Victsm commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks

Reply via email to