pan3793 commented on PR #51182: URL: https://github.com/apache/spark/pull/51182#issuecomment-2986645298
Instead of relying on the filename suffix or Spark session conf to choose the codec, I wonder if the Magic Number was considered? For example, ``` $ file unit-tests.log unit-tests.log: ASCII text, with very long lines (388) $ file unit-tests.log.gz unit-tests.log.gz: gzip compressed data, was "unit-tests.log", last modified: Mon Apr 21 13:03:04 2025, from Unix, original size modulo 2^32 11024393 $ file unit-tests.log.zst unit-tests.log.zst: Zstandard compressed data (v0.8+), Dictionary ID: None $ cp unit-tests.log.gz unit-tests.log.gz.foo $ file unit-tests.log.gz.foo unit-tests.log.gz.foo: gzip compressed data, was "unit-tests.log", last modified: Mon Apr 21 13:03:04 2025, from Unix, original size modulo 2^32 11024393 $ file unit-tests.log.zst.bar unit-tests.log.zst.bar: Zstandard compressed data (v0.8+), Dictionary ID: None ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
