Re: [PR] [SPARK-52482][SQL][CORE] ZStandard support for file data source reader [spark]

via GitHub Sun, 15 Jun 2025 18:55:54 -0700


pan3793 commented on PR #51182:
URL: https://github.com/apache/spark/pull/51182#issuecomment-2974887472


   Hadoop has built-in `org.apache.hadoop.io.compress.ZStandardCodec`, but it 
requires compilation with the native library, I think the right direction is to 
migrate it to zstd-jni, like other codecs, see HADOOP-17292 (lz4), HADOOP-17125 
(snappy), HADOOP-17825 (gzip).
   
   Even if you don't want to touch the Hadoop code, this PR approach looks too 
overkill, Hadoop provides `io.compression.codecs.CompressionCodec` to allow 
implementing custom codecs, implementing a 
`org.apache.spark.xxx.SparkZstdCompressionCodec` and rewriting 
`io.compression.codecs` should work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52482][SQL][CORE] ZStandard support for file data source reader [spark]

Reply via email to