HeartSaVioR commented on pull request #31618: URL: https://github.com/apache/spark/pull/31618#issuecomment-784551734
I agree with the statement the event log directory is most likely placed in remote storage in practice, and for that case reducing size would affect the overall latency. It would be really appreciated if we could see direct benchmark (compress and send to S3 & receive from S3 and decompress) - probably run 10~100 times for each and takes median?, but that's optional and I'd tend to agree small difference from compression/decompression could be caught with reduced network cost. Btw, ``` $ lz4 -d spark-d3deba027bd34435ba849e14fc2c42ef.lz4 Decoding file spark-d3deba027bd34435ba849e14fc2c42ef Error 44 : Unrecognized header : file cannot be decoded ``` makes me feel Spark does something wrong with lz4, or lz4 has varient which aren't compatible. Anyone knows why this doesn't work? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
