I'm sorry for my completely noobly question, but do I just need to
install BouncyCastle separately or will I need to change the tika pom
file or some other tika file to include BouncyCastle when I build tika?
Thanks
On Oct 14, 2009, at 6:01 PM, Benson Margulies wrote:
Well, you need BouncyCastle for the encryption code.
On Wed, Oct 14, 2009 at 5:50 PM, Daniel Higginbotham <dan...@flyingmachinestudios.com
> wrote:
Hello,
I just exported the tika from subversion, built the jars, and tried
to use tika-app to extract data from a copy-protected pdf. This is
the error I got:
$java -jar tika-app/target/tika-app-0.5-SNAPSHOT.jar file.pdf
Exception in thread "main" java.lang.NoClassDefFoundError: org/
bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:
1108)
at
org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
at
org
.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:
235)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:
180)
at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:
70)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:
103)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:174)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)
I don't really know java, but I need to be able to extract data from
copy-protected pdf's in solr.
I tried copying tika-core.jar and tika-parsers.jar so that they
would be used by solr, and in the solr output I get the following:
Oct 14, 2009 5:26:59 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: org/apache/pdfbox/pdmodel/
PDDocument
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:
54)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:
103)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:
116)
at
org
.apache
.solr
.handler
.extraction
.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
at
org
.apache
.solr
.handler
.ContentStreamHandlerBase
.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org
.apache
.solr
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
Any help would be greatly appreciated!
Thank you,
Daniel Higginbotham