"RE: Why would PDFTextStripper.getText() generate a NullPointerException ?" Follow Up on my previous post:
The Stack Trace that I am getting: =============================================================== Exception in thread "main" java.lang.NullPointerException at org.apache.pdfbox.pdmodel.graphics.PDGraphicsState.<init> (PDGraphicsState.java:83) at org.apache.pdfbox.util.PDFStreamEngine.processStream (PDFStreamEngine.java:201) at org.apache.pdfbox.util.PDFTextStripper.processPage (PDFTextStripper.java:367) at org.apache.pdfbox.util.PDFTextStripper.processPages (PDFTextStripper.java:291) at org.apache.pdfbox.util.PDFTextStripper.writeText (PDFTextStripper.java:247) at org.apache.pdfbox.util.PDFTextStripper.getText (PDFTextStripper.java:180) Description of my PDF that is having the problem: =================================================== The PDF (which I am not allowed to share) was created by a Scanner that OCR'd data from a FAXed page. The original FAX page was a Govt Form which had poor scan quality to begin with. As a result most of the lines that make up the Form's "boxes" are faded / incomplete when FAXed. Fortunately, the original Text Content is clearly typed and is correctly represented as text within the PDF Document. (I can actually copy/paste it from Acrobat Reader, for example). However, the PDF does contain much of the Govt Form's original fuzzy outline. Most of these graphical lines are faded/incomplete. Would these lines cause a problem with Text Extraction perhaps ? The Sample Code that I based my simple Java Class on: =========================================================== http://www.java-forums.org/advanced-java/8546-reading-text-using-pdfbox. html PDDocument pddDocument=PDDocument.load(new File("a.pdf")); PDFTextStripper textStripper=new PDFTextStripper(); System.out.println(textStripper.getText(pddDocumen t)); pddDocument.close(); (I am using the latest version of PDFBox and its supporting jar file for fo

