squito commented on a change in pull request #23453: [SPARK-26089][CORE] Handle 
corruption in large shuffle blocks
URL: https://github.com/apache/spark/pull/23453#discussion_r264268057
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/util/Utils.scala
 ##########
 @@ -337,6 +338,44 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  /**
+   * Copy all data from an InputStream to an OutputStream upto maxSize and
+   * close the input stream if all data is read.
+   * @return A tuple of boolean, which is whether the stream was fully copied, 
and an InputStream,
+   *         which is a combined stream of read data and any remaining data
 
 Review comment:
   this doc needs updating now.  Something like
   
   ```
   Copy the first `maxSize` bytes of data from the InputStream to an in-memory
    buffer, while still exposing the entire original input stream, primarily to 
check
    for corruption.
   
   This returns a new InputStream which contains the same data as the original
    input stream.  It may be entirely on an in-memory buffer, or it may be a 
combination
    of of in-memory data, and then continue to read from the original stream.  
The only real
    use of this is if the original input stream will potentially detect 
corruption while the data
    is being read (eg. from compression).  This allows for an eager check of 
corruption in
    the first maxSize bytes of data.
   
   @return A tuple of boolean, which is whether the stream was fully copied, 
and an
    InputStream which includes all data from the original stream (combining 
buffered data
    and remaining data in the original stream)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to