Re: Unable to extract PDF content

Saphira Fri, 26 Nov 2010 02:58:25 -0800

that's what it says, I think is always the same error


org.apache.tika.exception.TikaException: Unable to extract PDF content
        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
        at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:95)
        at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:18)
        at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:1)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.pdfbox.exceptions.WrappedIOException:
OperatorProcessor class org.pdfbox.util.operator.ShowTextGlyph could not be
instantiated
        at 
org.apache.pdfbox.util.PDFStreamEngine.<init>(PDFStreamEngine.java:152)
        at 
org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:129)
        at org.apache.tika.parser.pdf.PDF2XHTML.<init>(PDF2XHTML.java:69)
        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
        ... 7 more
Caused by: java.lang.ClassCastException:
org.pdfbox.util.operator.ShowTextGlyph cannot be cast to
org.apache.pdfbox.util.operator.OperatorProcessor
        at 
org.apache.pdfbox.util.PDFStreamEngine.<init>(PDFStreamEngine.java:146)
        ... 10 more
2010-11-26 08:27:42,113 WARN  fetcher.Fetcher - Error parsing:
http://www.egamaster.com/datos/politica_fr.pdf: failed(2,0): Unable to
extract PDF content

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-extract-PDF-content-tp1971600p1972145.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Unable to extract PDF content

Reply via email to