On Tue, 9 Aug 2011, Trevor Watson wrote:
Is there a list of file extensions supported by Tika? Or do we have to go through the list of file types and find the extensions ourselves?

Tika supports certain mimetypes. Depending on what detector you're using, Tika can happily support extracting text and metadata from a file with an incorrect extension on it

Additionally, not all extensions are unique - some are claimed by more than one file format (and hence more than one mimetype). The mimetypes are unique though

So, you should probably be speaking to whoever wrote your requirements, and explain to them this bug in their spec!

That said, if you have the auto detect parser, you can get the list of supported mimetypes from that (see the tika cli for an example). The mime registry should (I think...) also be able to give you the default extension for each mimetype you have. There's also a list of all the detectable extensions for each mimetype, but I've a feeling that's not easily exposed

Nick

Reply via email to