On Mon, 18 Jun 2012, Doug wrote:
I'm planning to use TIKA as part of a process for cataloging data on a share drive. Based on the website and tika-mimetypes.xml, the type detection looks pretty comprehensive. However, while browsing tika-mimetypes.xml, I noticed that about half of the mime-types listed have no associated glob, root-XML, or magic elements. Without this match criteria, can TIKA ever actually detect a file of one of these types?

To be detected, Tika will need something to go on. That could be a glob, a XML root element, some magic, or even a combination of all of them.

I browsed the detector source. It looks like it tries to match against
magic, then XML, then names/globs/patterns. If a mime-type doesn't have any
of these, can TIKA do anything with it? If so, why is it listed in the
tike-mimetypes.xml file?

The tike-mimetypes.xml file is used for both detection and information. With those entries, we can tell you something about the mimetype, even if we can't always detect it

Nick

Reply via email to