Hi Chris,

I didn't know Tika mime type detection was based on freedesktop.org. I've also developed a mimeType detection system built on top of freedesktop, leveraging the shared-mime-info database to be accurate. Is this what you guys have done as well? In any case, the point I was trying to make in my previous post was to leverage functionality that is available somewhere else as much as possible and focus on Tika core features. True, mime type detection is important for Tika. However, as you pointed out, mime type detection is a project by itself. If the idea of creating a commons.xx project for mime detection was floating around earlier, why not starting an Apache commons.xxx project based on Tika detection schema then? Now be a good time, don't you think? It would be a great addition to commons and would free Tika developer from maintaining the code base for it

All the best,

Stephane Bastian

Mattmann, Chris A wrote:
Hi Stephane,


This is definitely a good news. Besides very good parsers, Aperture also
has strong support for mime type. I know we also have support for
detecting mime types but at some point and time we may consider using
theirs and focus solely on writing Parsers?

I would be strongly against this mainly due to the fact that there is almost
a 1-to-1 correspondence between having a good mime detection system, and
parsing content. Tika has a fairly robust mime system based on
freedesktop.org's system and I think there is value in Apache having a good
mime detection system (in fact it was discussed, even before Tika's
inception, to take the Nutch mime type code and turn it into a commons-*
project).

Thanks,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [EMAIL PROTECTED]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.



Reply via email to