Could be a bug in PDFBox. Might want to ask on the pdfbox users' list.

-----Original Message-----
From: question.answer...@gmail.com [mailto:question.answer...@gmail.com] 
Sent: Friday, September 16, 2016 7:30 AM
To: user@tika.apache.org
Subject: [Tika] I have a question. --> "Exception : 
org.apache.pdfbox.cos.COSArray cannot be cast to 
org.apache.pdfbox.cos.COSDictionary"

An exception is raised in line:"parser.parse(new Fil ....".

"Exception : org.apache.pdfbox.cos.COSArray cannot be cast to 
org.apache.pdfbox.cos.COSDictionary"

Why exception occurs?
In other dozens of PDF, the exception does not occur.



below, my program.
-----------------------------------------------------
try {
    File document = new File("/usr/local/sample.pdf");

    PDFParser parser = new PDFParser();
    ContentHandler handler = new BodyContentHandler(Integer.MAX_VALUE);
    Metadata metadata = new Metadata();
    parser.parse(new FileInputStream(document), handler, metadata
                                                                         , new 
ParseContext());

    String plainText = handler.toString();
    System.out.println(plainText);
}
catch (FileNotFoundException e) {
    e.printStackTrace();
    throw new RuntimeException(e.getMessage()); } catch (IOException e) {
    e.printStackTrace();
    throw new RuntimeException(e.getMessage()); } catch (SAXException e) {
    e.printStackTrace();
    throw new RuntimeException(e.getMessage()); } catch (TikaException e) {
    e.printStackTrace();
    throw new RuntimeException(e.getMessage()); } catch (Exception e) {
    e.printStackTrace();
    throw new RuntimeException(e.getMessage()); }
-----------------------------------------------------

--
syosinnsya

Reply via email to