[GitHub] [spark] HeartSaVioR commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

GitBox Tue, 23 Feb 2021 14:15:32 -0800


HeartSaVioR commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784551734



   I agree with the statement the event log directory is most likely placed in 
remote storage in practice, and for that case reducing size would affect the 
overall latency. It would be really appreciated if we could see direct 
benchmark (compress and send to S3 & receive from S3 and decompress) - probably 
run 10~100 times for each and takes median?, but that's optional and I'd tend 
to agree small difference from compression/decompression could be caught with 
reduced network cost.
   
   Btw,
   
   ```
   $ lz4 -d spark-d3deba027bd34435ba849e14fc2c42ef.lz4
   Decoding file spark-d3deba027bd34435ba849e14fc2c42ef
   Error 44 : Unrecognized header : file cannot be decoded
   ```
   
   makes me feel Spark does something wrong with lz4, or lz4 has varient which 
aren't compatible. Anyone knows why this doesn't work?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Reply via email to