Looks like there’s a problem parsing that PDF. Without the file I couldn’t say why, sorry.
— John > On 14 Dec 2015, at 04:37, Joe Ye <[email protected]> wrote: > > Thanks for the reply John! > > > Unfortunately we cannot supply the problem PDF as it's customer data. > However, please see the log lines below when calling PDDocument.load: > > > > org.apache.pdfbox.pdfparser.BaseParser parseCOSDictionary > > WARNING: Invalid dictionary, found: '[' but expected: '/' > > WARN |1214-122639 493|main|extractors.PDFTextExtractor|java.io.IOException: > expected='R' actual='0' at offset 9983 > > > > As you can see, PDDocument.load throws java IOException here, whereas > previously with the force option set to true load would not throw. > > > Any idea what has caused the changed behaviour? > > > Kind regards, > > Joe > > On Fri, Dec 11, 2015 at 5:59 PM, John Hewson <[email protected]> wrote: > >> Hi Joe, >> >> The force option in 1.8 only did one thing: it skipped invalid characters >> in strings. >> We have better handing for this in 2.0 and so force is no longer necessary. >> >> Perhaps the problem you’re encountering is due to other changes in the >> parser >> in 2.0, if you could post a PDF publicly then we can take a look at it. >> >> — John >> >>> On 11 Dec 2015, at 07:02, Joe Ye <[email protected]> wrote: >>> >>> Hi, >>> >>> >>> With the latest version 2.0.0-RC2, I found that the force flag of the >> below >>> method signature (to skip corrupt PDF objects) no longer exists. This >> broke >>> some of our existing usage. Could you advise if there's an alternative >> way >>> to do it (i.e. skip corrupt objects)? >>> >>> >>> >>> public static PDDocument >>> < >> http://pdfbox.apache.org/docs/1.8.10/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html >>> >>> load(InputStream >>> < >> http://download.oracle.com/javase/1.5.0/docs/api/java/io/InputStream.html?is-external=true >>> >>> input, >>> boolean force) >>> throws IOException >>> < >> http://download.oracle.com/javase/1.5.0/docs/api/java/io/IOException.html?is-external=true >>> >>> >>> >>> >>> Many thanks, >>> >>> Joe >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

