How does tika detect mime types when asked to parse a file? It
routinely fails to detect text/plain files when the file has no
extension; whereas the `file` command (i.e. libmagic) can correctly
determine the same file is "ASCII English Text".
From what I understand from the mailing list archives, Tika was going
to use some XML file from freedesktop.org, but now is using something
from Nutch. Is this true? Where is this XML file? How compatible
is it with libmagic's file?
--
Jonathan Koren
[email protected]
http://www.soe.ucsc.edu/~jonathan/