Re: [PR] [SPARK-52482][SQL][CORE] ZStandard support for file data source reader [spark]

via GitHub Wed, 18 Jun 2025 22:20:51 -0700


pan3793 commented on PR #51182:
URL: https://github.com/apache/spark/pull/51182#issuecomment-2986645298


   Instead of relying on the filename suffix or Spark session conf to choose 
the codec, I wonder if the Magic Number was considered? For example, 
   
   ```
   $ file unit-tests.log
   unit-tests.log: ASCII text, with very long lines (388)
   $ file unit-tests.log.gz
   unit-tests.log.gz: gzip compressed data, was "unit-tests.log", last 
modified: Mon Apr 21 13:03:04 2025, from Unix, original size modulo 2^32 
11024393
   $ file unit-tests.log.zst
   unit-tests.log.zst: Zstandard compressed data (v0.8+), Dictionary ID: None
   $ cp unit-tests.log.gz unit-tests.log.gz.foo
   $ file unit-tests.log.gz.foo
   unit-tests.log.gz.foo: gzip compressed data, was "unit-tests.log", last 
modified: Mon Apr 21 13:03:04 2025, from Unix, original size modulo 2^32 
11024393
   $ file unit-tests.log.zst.bar
   unit-tests.log.zst.bar: Zstandard compressed data (v0.8+), Dictionary ID: 
None
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52482][SQL][CORE] ZStandard support for file data source reader [spark]

Reply via email to