[ https://issues.apache.org/jira/browse/TIKA-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804704#action_12804704 ]
Ken Krugler commented on TIKA-370: ---------------------------------- On the list, Jukka said: {quote} Yep. I think the <optional/> setting was added in PDFBox 0.8.0 and and we lost those dependencies when upgrading from 0.7.3. No problem adding them back in. {quote} > Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox > -------------------------------------------------------------------------- > > Key: TIKA-370 > URL: https://issues.apache.org/jira/browse/TIKA-370 > Project: Tika > Issue Type: Bug > Affects Versions: 0.6 > Reporter: Ken Krugler > Assignee: Ken Krugler > > While processing a bunch of PDFs off the web, I ran into a > ClassNotFoundException thrown inside of PDFBox: > java.lang.NoClassDefFoundError: > org/bouncycastle/jce/provider/BouncyCastleProvider > at > org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1092) > at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235) > at > org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114) > at bixo.parser.SimpleParser.parse(SimpleParser.java:153) > Caused by: java.lang.ClassNotFoundException: > org.bouncycastle.jce.provider.BouncyCastleProvider > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:303) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:248) > I believe the issue is that the PDFBox pom.xml declares the dependency on the > missing BouncyCastleProvider jar as "optional". > <dependency> > <groupId>bouncycastle</groupId> > <artifactId>bcprov-jdk14</artifactId> > <version>136</version> > <optional>true</optional> > </dependency> > As explained in the Maven documentation, this means that Tika needs to > explicitly include the jar: > http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html > I see a few other optional dependencies in the PDFBox pom.xml, but perhaps > the only one that's really critical is the above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.