Hi, On Fri, Oct 1, 2010 at 11:49 AM, Jan Høydahl / Cominvent <[email protected]> wrote: > * What is the correct mimetype? tika-mimetypes.xml lists > application/vnd.ms-tnef > I see references to application/ms-tnef other places, should we support both?
It looks like application/vnd.ms-tnef is the official type [1], but it would probably be a good idea to add ../ms-tnef as an alias. Can you file an improvement request for that? > * The 3rd party developers do not necessarily accept contributions > Rather than forking, could an alternative be to build a "glue" jar of > the Tika files only? I already contacted [email protected] and offered my changes for inclusion in the upstream codebase. Having a separate glue jar for just a single class seems a bit wasteful. > * Could we legally include with Tika a maven target or script which downloads > 3rd party jars? That would benefit developers (broader distribution) as well > as > the Tika community (better file format support). It would of course be legal to do so (i.e. we wouldn't be going to jail for that ;-), but Apache policies (see [2], most notably [3]) puts some limits on what an official Apache release can include. The reason for those policies is to make it easy to include Apache code also in commercial products, which I think is a Good Thing (TM). There's nothing stopping anyone from creating such an external Tika distribution that also contains dependencies under the GPL and other troublesome licenses. We'd even be happy to link to such efforts from the Tika web site, but it should still be a clearly separate effort to avoid confusing the licensing status of the official Tika releases. Anyway, the best long-term way forward would IMHO be to follow Nick's suggestion to implement this feature directly in POI. [1] http://www.iana.org/assignments/media-types/application/vnd.ms-tnef [2] http://www.apache.org/legal/resolved.html [3] http://www.apache.org/legal/resolved.html#criteria BR, Jukka Zitting
