I'm sorry for my completely noobly question, but do I just need to install BouncyCastle separately or will I need to change the tika pom file or some other tika file to include BouncyCastle when I build tika?

Thanks

On Oct 14, 2009, at 6:01 PM, Benson Margulies wrote:

Well, you need BouncyCastle for the encryption code.

On Wed, Oct 14, 2009 at 5:50 PM, Daniel Higginbotham <dan...@flyingmachinestudios.com > wrote:
Hello,

I just exported the tika from subversion, built the jars, and tried to use tika-app to extract data from a copy-protected pdf. This is the error I got:

$java -jar tika-app/target/tika-app-0.5-SNAPSHOT.jar file.pdf
Exception in thread "main" java.lang.NoClassDefFoundError: org/ bouncycastle/jce/provider/BouncyCastleProvider at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java: 1108) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573) at org .apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java: 235) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java: 180) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java: 70) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java: 103)
       at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:174)
       at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)


I don't really know java, but I need to be able to extract data from copy-protected pdf's in solr.

I tried copying tika-core.jar and tika-parsers.jar so that they would be used by solr, and in the solr output I get the following:

Oct 14, 2009 5:26:59 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: org/apache/pdfbox/pdmodel/ PDDocument at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java: 54) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java: 103) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java: 116) at org .apache .solr .handler .extraction .ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org .apache .solr .handler .ContentStreamHandlerBase .handleRequestBody(ContentStreamHandlerBase.java:54) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)


Any help would be greatly appreciated!

Thank you,
Daniel Higginbotham


Reply via email to