[GitHub] [spark] dongjoon-hyun commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

GitBox Tue, 23 Feb 2021 13:24:11 -0800


dongjoon-hyun commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784522469



   Hi, @HyukjinKwon . Why do you think so?
   > I think it's not an obvious win though .. Zstd looks more for archiving 
purpose with less throughput with high compression ratio vs lz4 is for more 
throughput with less compression.
   
   According to the benchmark, 
   - LZ4 1.7.5 compression time is not a winner. If you consider the upload 
time to the remote storage, ZSTD can be the winner.
   - LZ4 1.7.5 decompression time might be your reason. However, this is an 
event log.
      - When you download a log from `Spark History Server`, ZSTD log file will 
be downloaded 2~3x faster.
      - Also, when you view the log via `Spark History Server`, Spark History 
Server also do the download it from the remote storage like S3 and decompress 
it. 2~3x faster download will compensate the decompression downgrade slowdown.
    
   In addition, for the storage cost saving, ZSTD is a clear winner.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Reply via email to