>
> The problem is not about the code that reads and interprets the
> database, but about the database (freedesktop.org.xml) and the related
> database description (freedesktop.org.dtd).
>
> If we wanted we could recreate both the database description (by
> reading the spec and writing our own DTD file) and even the database
> (by collecting vast amounts of content type information) under the
> Apache license, but AFAIK the current versions included in the patch
> are largely based on the GPL-licensed versions from freedesktop.org.
>
> So my suggestion would be to drop the xml and dtd files from the patch
> and replace them with configuration options for pointing the (Apache
> licensed) code to externally acquired database files.
+1 to this. Then, how about creating a separate issue to develop Tika's mime
type DTD (which as I read the entire discussion, is fine to be "based on"
the freedesktop one, however, should be our own) and a baseline mime type
database. What are your feelings for using the Nutch one as a starting point
for the Tika mime database? Again, I agree with you that this is a separate
issue w.r.t. to TIKA-6, but I thought I'd just ask your opinion now.
Thanks for the insight. I will update the TIKA-6 patch to include the other
issues you raised, and to remove the freedesktop.org.{dtd|xml} files.
Cheers,
Chris
______________________________________________
Chris Mattmann, Ph.D.
[EMAIL PROTECTED]
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory Pasadena, CA
Office: 171-266B Mailstop: 171-246
_______________________________________________________
Disclaimer: The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.