Andreas Meier created TIKA-2576: ----------------------------------- Summary: Add application/zstd detection and parser Key: TIKA-2576 URL: https://issues.apache.org/jira/browse/TIKA-2576 Project: Tika Issue Type: Improvement Components: detector, parser Reporter: Andreas Meier Attachments: huffman-compressed-larger, huffmann-compressed-larger-result.txt
The IETF is currently checking the specification of Zstandard compression and the application/zstd Media Type: [https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html|https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html] As soon as the MediaType application/zstd is set as standard the Media Type shall be implemented. Possible mime-detection for tika-mimetypes.xml (second comment has to be changed when the standard is final): {code:xml} <mime-type type="application/zstd"> <_comment>https://en.wikipedia.org/wiki/Zstandard</_comment> <_comment>https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html</_comment> <magic priority="50"> <match value="0xFD2FB528" type="little32" offset="0"/> </magic> <glob pattern="*.zstd"/> </mime-type> {code} commons-compress version 1.16 and later provide a compressor and decompressor for the algorithm, based on com.github.luben zstd-jni [https://github.com/luben/zstd-jni|https://github.com/luben/zstd-jni] Attached sampe zstd file (huffman-compressed-larger) and the result after decompressing it. Decompression was done with commons-compress 1.16.1 and zstd-jni 1.3.3-3 {code:xml} <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> <version>1.16.1</version> </dependency> <dependency> <groupId>com.github.luben</groupId> <artifactId>zstd-jni</artifactId> <version>1.3.3-3</version> </dependency> {code} Regards Andreas -- This message was sent by Atlassian JIRA (v7.6.3#76005)