Could be this...
https://issues.apache.org/jira/browse/PDFBOX-455

... but the stacktrace differs a bit

 the above issue is flagged as fixed-for version of 0.8.0-incubator

looks like tika-parsers v0.5 referenced a prior version (0.7.3)


On Mon, Jun 18, 2012 at 6:12 PM, Mark Kerzner <[email protected]> wrote:
> Hi,
>
> I get this exception (see stack trace below), even though I am seemingly
> catching it in my code, which says this
>
>     public void parse(String fileName, Metadata metadata) {
>         TikaInputStream inputStream = null;
>         try {
>             // the given input stream is closed by the parseToString method
> (see Tika documentation)
>             // we will close it just in case :)
>             inputStream = TikaInputStream.get(new File(fileName));
>             String text = tika.parseToString(inputStream,
> metadata);             // --------_ exception happens here
>             metadata.set(DocumentMetadataKeys.DOCUMENT_TEXT, text);
>         } catch (Exception e) {
>             // the show must still go on
>             History.appendToHistory("Exception: " + e.getMessage());
>             metadata.set(DocumentMetadataKeys.PROCESSING_EXCEPTION,
> e.getMessage());
>         } catch (OutOfMemoryError m) {
>             History.appendToHistory("Memory Exception: " + m.getMessage());
>             metadata.set(DocumentMetadataKeys.PROCESSING_EXCEPTION,
> m.getMessage());
>         } finally {
>             if (inputStream != null) {
>                 try {
>                     inputStream.close();
>                 } catch (Exception e) {
>                     e.printStackTrace(System.out);
>                 }
>             }
>         }
>     }
>
>
> 2012-06-19 00:47:06,425 WARN org.apache.pdfbox.util.PDFStreamEngine:
> java.lang.ClassCastException: org.apache.pdfbox.cos.COSFloat cannot be cast
> to org.apache.pdfbox.cos.COSName
> java.lang.ClassCastException: org.apache.pdfbox.cos.COSFloat cannot be cast
> to org.apache.pdfbox.cos.COSName
>     at
> org.apache.pdfbox.util.operator.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:48)
>     at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
>     at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
>     at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>     at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
>     at
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
>     at
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
>     at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
>     at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
>     at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:96)
>     at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>     at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>     at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>     at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
>     at
> org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:82)
>     at
> org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
>     at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
>     at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>     at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>     at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>     at org.apache.tika.Tika.parseToString(Tika.java:380)
>     at org.freeeed.main.DocumentParser.parse(DocumentParser.java:33)
>
>
>
> That's testing on Enron data set



-- 
Jon Gorrono
PGP Key: 0x5434509D -
http{pgp.mit.edu:11371/pks/lookup?search=0x5434509D&op=index}
GSWoT Introducer - {GSWoT:US75 5434509D Jon P. Gorrono <jpgorrono -
www.gswot.org>}
http{middleware.ucdavis.edu}

Reply via email to