Hi,

On 6/13/07, Philipp Koch <[EMAIL PROTECTED]> wrote:
i am currently also doing meta data extraction from various file
formats and got also attracted by the introduction of the tika
project. i found a very interesting image meta data extractor library
which is shipped under apache license but the project itself is not
hosted at apache (see http://www.fightingquaker.com/sanselan/).

Looks nice!

would it make sense to ask the project owner(s) of such projects to
move to the apache project, to also make sure that such useful libs
will be maintained and development will continue?

It's up to the external project community to decide if they want to
become an Apache project. We can of course mention the Incubator and
offer to help if they want to bring the project to Apache, but I
wouldn't want to go on a crusade to turn all our dependencies into
Apache projects.

I think the prime criteria on selecting which external libraries to
use as default parsers in Tika (a plugin interface should of course
allow any other libraries to be used instead of the defaults if
needed) would be code quality, licensing, and active maintenance. All
of these are typically well handled by Apache projects, but there's no
inherent rule that external projects couldn't achieve these criteria
just as well or even better than Apache projects.

So, once we have our act together (a working codebase and an
architectural roadmap) I think we should start contacting various
parser projects for cooperation. We should explain what we are trying
to do and preferably have for each parser library we depend on someone
who is following the mailing lists for both Tika and the parser
library in question. While building those bridges we could also
mention the chance of bringing external projects into Apache, but that
definitely shouldn't be a precondition on cooperation.

ps: don't know if this is the right place for such questions....

Good as any. :-)

BR,

Jukka Zitting

Reply via email to