Hi...

I need to map the MIME types that the Tika team's incredible work has put
in the corresponding configuration file

   -
   
https://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml


Counting tags only, apparently there are 1,304 different variations of MIME
types there (!), so I would like to map them to, say, a few custom
top-level categories like "Office", "PDF", "Audio", "Video", or similar.

Assuming this is not done in Tika, what would be the fastest way of parsing
in the 1,304 "registered" MIME types and mapping them to categories? I am
thinking of getting the above configuration file and adding a custom
"category" tag on each MIME type, so that I can then parse it in as XML.

Is that sensible and the fastest way possible? Any Tika code I should
reuse, at least for the parsing of MIME types from the configuration file?

Thanks!

Reply via email to