How does tika detect mime types when asked to parse a file? It routinely fails to detect text/plain files when the file has no extension; whereas the `file` command (i.e. libmagic) can correctly determine the same file is "ASCII English Text".

From what I understand from the mailing list archives, Tika was going to use some XML file from freedesktop.org, but now is using something from Nutch. Is this true? Where is this XML file? How compatible is it with libmagic's file?

--
Jonathan Koren
[email protected]
http://www.soe.ucsc.edu/~jonathan/


Reply via email to