Looks like there’s a problem parsing that PDF. Without the file I couldn’t say 
why, sorry.

— John


> On 14 Dec 2015, at 04:37, Joe Ye <[email protected]> wrote:
> 
> Thanks for the reply John!
> 
> 
> Unfortunately we cannot supply the problem PDF as it's customer data.
> However, please see the log lines below when calling PDDocument.load:
> 
> 
> 
> org.apache.pdfbox.pdfparser.BaseParser parseCOSDictionary
> 
> WARNING: Invalid dictionary, found: '[' but expected: '/'
> 
> WARN |1214-122639 493|main|extractors.PDFTextExtractor|java.io.IOException:
> expected='R' actual='0' at offset 9983
> 
> 
> 
> As you can see, PDDocument.load throws java IOException here, whereas
> previously with the force option set to true load would not throw.
> 
> 
> Any idea what has caused the changed behaviour?
> 
> 
> Kind regards,
> 
> Joe
> 
> On Fri, Dec 11, 2015 at 5:59 PM, John Hewson <[email protected]> wrote:
> 
>> Hi Joe,
>> 
>> The force option in 1.8 only did one thing: it skipped invalid characters
>> in strings.
>> We have better handing for this in 2.0 and so force is no longer necessary.
>> 
>> Perhaps the problem you’re encountering is due to other changes in the
>> parser
>> in 2.0, if you could post a PDF publicly then we can take a look at it.
>> 
>> — John
>> 
>>> On 11 Dec 2015, at 07:02, Joe Ye <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> With the latest version 2.0.0-RC2, I found that the force flag of the
>> below
>>> method signature (to skip corrupt PDF objects) no longer exists. This
>> broke
>>> some of our existing usage. Could you advise if there's an alternative
>> way
>>> to do it (i.e. skip corrupt objects)?
>>> 
>>> 
>>> 
>>> public static PDDocument
>>> <
>> http://pdfbox.apache.org/docs/1.8.10/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html
>>> 
>>> load(InputStream
>>> <
>> http://download.oracle.com/javase/1.5.0/docs/api/java/io/InputStream.html?is-external=true
>>> 
>>> input,
>>>             boolean force)
>>>                      throws IOException
>>> <
>> http://download.oracle.com/javase/1.5.0/docs/api/java/io/IOException.html?is-external=true
>>> 
>>> 
>>> 
>>> 
>>> Many thanks,
>>> 
>>> Joe
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to