On Tue, 7 Feb 2012, Jan Høydahl wrote:
Would it be possible to add support to extract the proprietary MS .CAB archive format? I cannot find any Java-based extractors out there but there exists one in C.

You'd need to read either the file format docs, or the C source code to understand the format (whichever is easier), then use that to write Java code for it. I think you should be able to find existing Java code to handle DEFLATE (in Java itself or Commons Compress) and LZX (in POI), not sure about Quantum.

Alternately, if you have command line tools to read the format, you may be able to use that from Tika. However, that'd need a bit of work, as the Tika external parsers support doesn't currently handle embedded resources

Nick

Reply via email to