Hi, Revisiting a topic that we've considered already before (in at least [1], [2] and [3])...
I'm working on integrating Tika to Jackrabbit [4], and there we found it desirable [5] to make it easier to depend on just the core Tika classes without all the parser dependencies. To make this happen, I'd split Tika into following component libraries: * tika-core - core parts of Tika; everything but cli, gui, and the parser.* packages * tika-parsers - format-specific parser classes; with dependencies to external libraries * tika-app - depends on all of the above; adds cli and gui; standalone jar packaging We could (should?) further split the tika-parsers component into smaller pieces based on the external dependencies used to allow finer-grained control over what parser libraries get included in a specific downstream package or deployment. WDYT? If there are no objections, I'd like to target this for the Tika 0.4 release. [1] http://markmail.org/message/n64zb3cawlm4ng3k [2] http://markmail.org/message/ji3xabugnt6wlwdh [3] http://markmail.org/message/2sd6d5ajhpqhcwcf [4] https://issues.apache.org/jira/browse/JCR-1878 [5] http://markmail.org/message/cf6bj7qv7fyyxezu BR, Jukka Zitting