On Mon, 18 Jun 2012, Doug wrote:
I'm planning to use TIKA as part of a process for cataloging data on a share drive. Based on the website and tika-mimetypes.xml, the type detection looks pretty comprehensive. However, while browsing tika-mimetypes.xml, I noticed that about half of the mime-types listed have no associated glob, root-XML, or magic elements. Without this match criteria, can TIKA ever actually detect a file of one of these types?
To be detected, Tika will need something to go on. That could be a glob, a XML root element, some magic, or even a combination of all of them.
I browsed the detector source. It looks like it tries to match against magic, then XML, then names/globs/patterns. If a mime-type doesn't have any of these, can TIKA do anything with it? If so, why is it listed in the tike-mimetypes.xml file?
The tike-mimetypes.xml file is used for both detection and information. With those entries, we can tell you something about the mimetype, even if we can't always detect it
Nick
