pan3793 commented on PR #51182:
URL: https://github.com/apache/spark/pull/51182#issuecomment-2986645298

   Instead of relying on the filename suffix or Spark session conf to choose 
the codec, I wonder if the Magic Number was considered? For example, 
   
   ```
   $ file unit-tests.log
   unit-tests.log: ASCII text, with very long lines (388)
   $ file unit-tests.log.gz
   unit-tests.log.gz: gzip compressed data, was "unit-tests.log", last 
modified: Mon Apr 21 13:03:04 2025, from Unix, original size modulo 2^32 
11024393
   $ file unit-tests.log.zst
   unit-tests.log.zst: Zstandard compressed data (v0.8+), Dictionary ID: None
   $ cp unit-tests.log.gz unit-tests.log.gz.foo
   $ file unit-tests.log.gz.foo
   unit-tests.log.gz.foo: gzip compressed data, was "unit-tests.log", last 
modified: Mon Apr 21 13:03:04 2025, from Unix, original size modulo 2^32 
11024393
   $ file unit-tests.log.zst.bar
   unit-tests.log.zst.bar: Zstandard compressed data (v0.8+), Dictionary ID: 
None
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to