The tika annotator is packaging the following 3rd party artifacts at the moment:
>>> These first 2 are the Apache Licensed code of the actual tika project <<< org.apache.tika:tika-core:jar:0.4:compile org.apache.tika:tika-parsers:jar:0.4:compile >>> These next are 3rd party Jars. Some not from Apache <<< org.apache.commons:commons-compress:jar:1.0:compile pdfbox:pdfbox:jar:0.7.3:compile org.fontbox:fontbox:jar:0.1.0:compile org.jempbox:jempbox:jar:0.2.0:compile bouncycastle:bcmail-jdk14:jar:136:compile bouncycastle:bcprov-jdk14:jar:136:compile org.apache.poi:poi:jar:3.5-beta6:compile org.apache.poi:poi-scratchpad:jar:3.5-beta6:compile org.apache.poi:poi-ooxml:jar:3.5-beta6:compile org.apache.poi:ooxml-schemas:jar:1.0:compile org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile dom4j:dom4j:jar:1.6.1:compile xml-apis:xml-apis:jar:1.0.b2:compile org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0:compile commons-logging:commons-logging:jar:1.1.1:compile net.sourceforge.nekohtml:nekohtml:jar:1.9.9:compile xerces:xercesImpl:jar:2.8.1:compile asm:asm:jar:3.1:compile log4j:log4j:jar:1.2.14:compile I'm pretty sure that the LICENSE/NOTICE files need updating to cover all of these components. I checked one, the asm:asm:jar:3.1 - and found it's license requires including certain notices, contain in its license, in all redistributions. The Tika project gets around this by not having any binary distribution; so they never distribute (themselves) these parts. I believe we also need to get an Export 5D002 Registration for the Tika Annotator, and also for the Sandbox Distribution which includes it. Any volunteers to research the License/Notice requirements for these components? -Marshall
