Hi

I'm the maintainer of the Ruby gem mime-types and its associated data
gem/repo, mime-types/mime-types-data.

While the vast majority of the data is pulled from the IANA media type
registry, one thing which has always been a bit ad hoc is extensions. This
is currently enriched from the Apache httpd MIME list, but I have been
considering extending the data with the `tika-mimetypes.xml` from tika-core.

I have implemented a parser to integrate these at
https://github.com/mime-types/mime-types-data/pull/142, but before merging
it I wanted to raise the question as to whether this would be
considered fair use by the Tika project. The changes essentially:

1. Parse the current `tika-mimetypes.xml` from the main branch of Tika on
GitHub.
2. Skip over any `mime-type` record that has attributes (MIME::Types is
about resolving the primary media types and does not support format or
version attributes).
3. Extracts the `glob` entries for use in the `extensions` field. Globs
that use `*` in the middle of a filename are excluded, because that's now
how the Ruby MIME::Types field works (I could add a new `glob` field, but
that will take a bit more work).
4. Updates the `extensions` field for any existing MIME::Type or creates
new unregistered (not defined in IANA) types for new ones.

-a
-- 
Austin Ziegler • halosta...@gmail.com • aus...@halostatue.ca
http://www.halostatue.ca/http://twitter.com/halostatue

Reply via email to