Hi Scott, which version of PDFBox are you using? Is it possible to share one of the PDFs at a public location?
BR Maruan > Am 01.06.2017 um 12:11 schrieb RENTON Scott <[email protected]>: > > > Hi folks (apologies- hit send too soon) > > We run pdfbox for pdf text extraction under the Dspace application. > > Occasionally we get the odd failure, and we’re investigating some errors just > now. I’m just wondering what property of the PDF in question it’s looking at > here, and if there’s any way we can mitigate against that. It’s certainly not > the title. > > > One is: > java.lang.RuntimeException: java.io.IOException: Not a number: + > java.lang.RuntimeException: java.io.IOException: Not a number: + > at > org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:178) > at > org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442) > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366) > at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322) > at > org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101) > > > And here’s another: > > java.lang.NumberFormatException: For input string: "dup" > java.lang.NumberFormatException: For input string: "dup" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:492) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.pdfbox.pdmodel.font.PDType1Font.getEncodingFromFont(PDType1Font.java:344) > at > org.apache.pdfbox.pdmodel.font.PDType1Font.determineEncoding(PDType1Font.java:280) > at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:181) > at org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:83) > at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:152) > at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108) > at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java: > 5) > at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115) > > Thanks > Scott > -- > Scott Renton > Digital Development > Library and University Collections > Argyle House, Floor F > ext: 515219 > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

