The tika annotator is packaging the following 3rd party artifacts at the
moment:

>>> These first 2 are the Apache Licensed code of the actual tika
project <<<

org.apache.tika:tika-core:jar:0.4:compile
org.apache.tika:tika-parsers:jar:0.4:compile

>>> These next are 3rd party Jars.  Some not from Apache <<<

org.apache.commons:commons-compress:jar:1.0:compile
pdfbox:pdfbox:jar:0.7.3:compile
org.fontbox:fontbox:jar:0.1.0:compile
org.jempbox:jempbox:jar:0.2.0:compile
bouncycastle:bcmail-jdk14:jar:136:compile
bouncycastle:bcprov-jdk14:jar:136:compile
org.apache.poi:poi:jar:3.5-beta6:compile
org.apache.poi:poi-scratchpad:jar:3.5-beta6:compile
org.apache.poi:poi-ooxml:jar:3.5-beta6:compile
org.apache.poi:ooxml-schemas:jar:1.0:compile
org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
dom4j:dom4j:jar:1.6.1:compile
xml-apis:xml-apis:jar:1.0.b2:compile
org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0:compile
commons-logging:commons-logging:jar:1.1.1:compile
net.sourceforge.nekohtml:nekohtml:jar:1.9.9:compile
xerces:xercesImpl:jar:2.8.1:compile
asm:asm:jar:3.1:compile
log4j:log4j:jar:1.2.14:compile

I'm pretty sure that the LICENSE/NOTICE files need updating to cover all
of these components.  I checked one, the asm:asm:jar:3.1 - and found
it's license requires including certain notices, contain in its license,
in all redistributions.

The Tika project gets around this by not having any binary distribution;
so they never
distribute (themselves) these parts.

I believe we also need to get an Export 5D002 Registration for the Tika
Annotator, and also for the
Sandbox Distribution which includes it.

Any volunteers to research the License/Notice requirements for these
components?

-Marshall

Reply via email to