Hi, I used PDFBox 1.8.4. I went ahead and created an issue with JIRA and uploaded the PDF file there. I used most of my original email text. Thanks, Craig
________________________________ From: Tilman Hausherr <[email protected]> To: [email protected] Sent: Friday, March 14, 2014 2:52 AM Subject: Re: Extracting text from PDF with no embedded fonts Hi, The best would be to create an issue with JIRA and upload the file there, if it isn't confidential. Re "the latest", did you use an 1.8 version or a 2.0 version? Tilman Am 10.03.2014 21:19, schrieb Craig Strong: > I have been using PDFBox to extract text from several different PDF files > fine. I use the latest PDFBox app with ExtractText class. There is one PDF > that PDFBox (and iText) fails to extract any text even though I can extract > the text with Adobe Reader and also pdftotext.exe part of XPdf. I don't want > to have to rely on using pdftotext.exe from a PC since this is part of an > automated application. I think the error relates to an unknown font type and > having to use the few fonts installed in the jar file. I tried running the > API classes and trying to force a font from a certain location but I still > got errors. I thought I loaded the font with the loadTTF method but I don't > know if that did anything with the font. I would really like to have this > working straight from the ExtractText class anyway. I'm thinking I might > have to build my own after putting a bunch of Windows fonts somewhere and > changing a properties file but I really don't know > if that is the right direction I should be taking and I am new to PDFBox. >Any ideas? > Here are the errors I am getting. I tried this from both a Windows PC and > our system but I get the same errors. The section starting > processEncodedText and on repeats a few times so I just included the first > entries. > Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory >createFont > WARNING: Substituting TrueType for unknown font subtype= > Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator > WARNING: java.lang.NullPointerException > Throwable occurred: java.lang.NullPointerException > at >org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375) > at >org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221) > at >org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:119) > at >org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121) > at >org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204) > at >org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) > at >org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54) > at >org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) > at >org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) > at >org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) > at >org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) > at >org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456) > at >org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) > at >org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) > at >org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275) > at org.apache.pdfbox.ExtractText.main(ExtractText.java:85) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:58) > Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine > processEncodedText > WARNING: java.lang.NullPointerException > Throwable occurred: java.lang.NullPointerException > at >org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355) > at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) > at >org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) > at >org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) > at >org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) > at >org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) > at >org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456) > at >org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) > at >org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) > at >org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275) > at org.apache.pdfbox.ExtractText.main(ExtractText.java:85) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:58) > Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator > WARNING: java.lang.NullPointerException > Throwable occurred: java.lang.NullPointerException > at >org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:364) > at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) > at >org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) > at >org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) > at >org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) > at >org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) > at >org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456) > at >org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) > at >org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) > at >org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275) > at org.apache.pdfbox.ExtractText.main(ExtractText.java:85) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:58) > > Thanks, > Craig Strong

