Hi Maruan, thanks for the swift response. It looks like it’s 1.6.0 (quite old?)- that’s certainly the .jar that’s sitting in the dspace lib directory. I’ve copied in George as he’s investigating this too; George, I take it we’re ok to send Maruan a link to the relevant records in the repository?
Cheers Scott -- Scott Renton Digital Development Library and University Collections Argyle House, Floor F ext: 515219 On 01/06/2017 11:18, "Maruan Sahyoun" <[email protected]> wrote: >Hi Scott, > >which version of PDFBox are you using? Is it possible to share one of the PDFs >at a public location? > >BR >Maruan > >> Am 01.06.2017 um 12:11 schrieb RENTON Scott <[email protected]>: >> >> >> Hi folks (apologies- hit send too soon) >> >> We run pdfbox for pdf text extraction under the Dspace application. >> >> Occasionally we get the odd failure, and we’re investigating some errors >> just now. I’m just wondering what property of the PDF in question it’s >> looking at here, and if there’s any way we can mitigate against that. It’s >> certainly not the title. >> >> >> One is: >> java.lang.RuntimeException: java.io.IOException: Not a number: + >> java.lang.RuntimeException: java.io.IOException: Not a number: + >> at >> org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:178) >> at >> org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187) >> at >> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266) >> at >> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) >> at >> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) >> at >> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442) >> at >> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366) >> at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322) >> at >> org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101) >> >> >> And here’s another: >> >> java.lang.NumberFormatException: For input string: "dup" >> java.lang.NumberFormatException: For input string: "dup" >> at >> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) >> at java.lang.Integer.parseInt(Integer.java:492) >> at java.lang.Integer.parseInt(Integer.java:527) >> at >> org.apache.pdfbox.pdmodel.font.PDType1Font.getEncodingFromFont(PDType1Font.java:344) >> at >> org.apache.pdfbox.pdmodel.font.PDType1Font.determineEncoding(PDType1Font.java:280) >> at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:181) >> at org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:83) >> at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:152) >> at >> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108) >> at >> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java: >> 5) >> at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115) >> >> Thanks >> Scott >> -- >> Scott Renton >> Digital Development >> Library and University Collections >> Argyle House, Floor F >> ext: 515219 >> >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [email protected] >For additional commands, e-mail: [email protected] > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

