Hi I'm the maintainer of the Ruby gem mime-types and its associated data gem/repo, mime-types/mime-types-data.
While the vast majority of the data is pulled from the IANA media type registry, one thing which has always been a bit ad hoc is extensions. This is currently enriched from the Apache httpd MIME list, but I have been considering extending the data with the `tika-mimetypes.xml` from tika-core. I have implemented a parser to integrate these at https://github.com/mime-types/mime-types-data/pull/142, but before merging it I wanted to raise the question as to whether this would be considered fair use by the Tika project. The changes essentially: 1. Parse the current `tika-mimetypes.xml` from the main branch of Tika on GitHub. 2. Skip over any `mime-type` record that has attributes (MIME::Types is about resolving the primary media types and does not support format or version attributes). 3. Extracts the `glob` entries for use in the `extensions` field. Globs that use `*` in the middle of a filename are excluded, because that's now how the Ruby MIME::Types field works (I could add a new `glob` field, but that will take a bit more work). 4. Updates the `extensions` field for any existing MIME::Type or creates new unregistered (not defined in IANA) types for new ones. -a -- Austin Ziegler • halosta...@gmail.com • aus...@halostatue.ca http://www.halostatue.ca/ • http://twitter.com/halostatue