Hi Maruan, thanks for the swift response. It looks like it’s 1.6.0 (quite 
old?)- that’s certainly the .jar that’s sitting in the dspace lib directory. 
I’ve copied in George as he’s investigating this too; George, I take it we’re 
ok to send Maruan a link to the relevant records in the repository?

Cheers
Scott
-- 
Scott Renton

Digital Development
Library and University Collections
Argyle House, Floor F
ext: 515219








On 01/06/2017 11:18, "Maruan Sahyoun" <[email protected]> wrote:

>Hi Scott,
>
>which version of PDFBox are you using? Is it possible to share one of the PDFs 
>at a public location?
>
>BR
>Maruan
>
>> Am 01.06.2017 um 12:11 schrieb RENTON Scott <[email protected]>:
>> 
>> 
>> Hi folks (apologies- hit send too soon)
>> 
>> We run pdfbox for pdf text extraction under the Dspace application.
>> 
>> Occasionally we get the odd failure, and we’re investigating some errors 
>> just now. I’m just wondering what property of the PDF in question it’s 
>> looking at here, and if there’s any way we can mitigate against that. It’s 
>> certainly not the title.
>> 
>> 
>> One is:
>> java.lang.RuntimeException: java.io.IOException: Not a number: +
>> java.lang.RuntimeException: java.io.IOException: Not a number: +
>> at 
>> org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:178)
>> at 
>> org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187)
>> at 
>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266)
>> at 
>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>> at 
>> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
>> at 
>> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
>> at 
>> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
>> at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
>> at 
>> org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101)
>> 
>> 
>> And here’s another:
>> 
>> java.lang.NumberFormatException: For input string: "dup"
>> java.lang.NumberFormatException: For input string: "dup"
>> at 
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>> at java.lang.Integer.parseInt(Integer.java:492)
>> at java.lang.Integer.parseInt(Integer.java:527)
>> at 
>> org.apache.pdfbox.pdmodel.font.PDType1Font.getEncodingFromFont(PDType1Font.java:344)
>> at 
>> org.apache.pdfbox.pdmodel.font.PDType1Font.determineEncoding(PDType1Font.java:280)
>> at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:181)
>> at org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:83)
>> at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:152)
>> at 
>> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
>> at 
>> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:
>> 5)
>> at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
>> 
>> Thanks
>> Scott
>> -- 
>> Scott Renton
>> Digital Development
>> Library and University Collections
>> Argyle House, Floor F
>> ext: 515219
>> 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [email protected]
>For additional commands, e-mail: [email protected]
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Reply via email to