On Tue, 7 Feb 2012, Public Network Services wrote:
Counting tags only, apparently there are 1,304 different variations of MIME types there (!), so I would like to map them to, say, a few custom top-level categories like "Office", "PDF", "Audio", "Video", or similar.

Assuming this is not done in Tika, what would be the fastest way of parsing
in the 1,304 "registered" MIME types and mapping them to categories?

Audio and Video should be easy, they're already done in the mimetypes themselves

Tika mimetypes do have a hierarchy, so you can get some information from that. For example, all the OLE2 based MS Office formats have a common parent, as do the OOXML ones, Apple iworks etc

Nick

Reply via email to