dbtsai commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses 
Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-541256017
 
 
   @tgravescs it's record / ms. When the num of blocks are large, two steps and 
one step will have similar result, but two step will never be slower.
   
   I switched to use `org.apache.commons.io.output.ByteArrayOutputStream` 
instead of the standard `ByteArrayOutputStream`, and saw another 25% 
performance gain.
   
   In Apache's `ByteArrayOutputStream` doc, it says
   
   ```java
   /* In contrast
    * to the original it doesn't reallocate the whole memory block but allocates
    * additional buffers. This way no buffers need to be garbage collected and
    * the contents don't have to be copied to the new buffer.
    */
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to