Hi Scott,

which version of PDFBox are you using? Is it possible to share one of the PDFs 
at a public location?

BR
Maruan

> Am 01.06.2017 um 12:11 schrieb RENTON Scott <[email protected]>:
> 
> 
> Hi folks (apologies- hit send too soon)
> 
> We run pdfbox for pdf text extraction under the Dspace application.
> 
> Occasionally we get the odd failure, and we’re investigating some errors just 
> now. I’m just wondering what property of the PDF in question it’s looking at 
> here, and if there’s any way we can mitigate against that. It’s certainly not 
> the title.
> 
> 
> One is:
> java.lang.RuntimeException: java.io.IOException: Not a number: +
> java.lang.RuntimeException: java.io.IOException: Not a number: +
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:178)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
> at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
> at 
> org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101)
> 
> 
> And here’s another:
> 
> java.lang.NumberFormatException: For input string: "dup"
> java.lang.NumberFormatException: For input string: "dup"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:492)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> org.apache.pdfbox.pdmodel.font.PDType1Font.getEncodingFromFont(PDType1Font.java:344)
> at 
> org.apache.pdfbox.pdmodel.font.PDType1Font.determineEncoding(PDType1Font.java:280)
> at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:181)
> at org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:83)
> at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:152)
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
> at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:
> 5)
> at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
> 
> Thanks
> Scott
> -- 
> Scott Renton
> Digital Development
> Library and University Collections
> Argyle House, Floor F
> ext: 515219
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to