aPage.findCropBox() in PDFStreamEngine.java line 202 (on HEAD tag) is returning null. That null is passed into the PDGraphicsState constructor and throws the exception your seeing.
I'm guessing that the page doesn't have a cropbox, nor does its parent, nor does it have a media box. This may not be a valid PDF, but I'm not sure as I haven't looked into the page layout part of specification. Since I'm not familiar with the spec, I can't really suggest a fix for this. Can you open the PDF with another program like Adobe Reader? If it works there then there should be some sane solution for PDFBox even if the PDF is out of spec. As a test you could add a quick if(page != null) in PDGraphicsState.java before line 83 If you want to just run a test. That's not a proper solution, but it might at least give you more info and perhaps someone else on the will have an idea of where to go next. Of course, you'll have to compile PDFBox to do this (not sure if you're doing that now or just using the 1.1.0 jar files). Hope this helps you at least a little. --Adam From: "Lupton, Chris B." <[email protected]> To: <[email protected]> Cc: "Lupton, Chris B." <[email protected]> Date: 05/11/2010 14:09 Subject: RE: Why would PDFTextStripper.getText() generate a NullPointerException ? "RE: Why would PDFTextStripper.getText() generate a NullPointerException ?" Follow Up on my previous post: The Stack Trace that I am getting: =============================================================== Exception in thread "main" java.lang.NullPointerException at org.apache.pdfbox.pdmodel.graphics.PDGraphicsState.<init> (PDGraphicsState.java:83) at org.apache.pdfbox.util.PDFStreamEngine.processStream (PDFStreamEngine.java:201) at org.apache.pdfbox.util.PDFTextStripper.processPage (PDFTextStripper.java:367) at org.apache.pdfbox.util.PDFTextStripper.processPages (PDFTextStripper.java:291) at org.apache.pdfbox.util.PDFTextStripper.writeText (PDFTextStripper.java:247) at org.apache.pdfbox.util.PDFTextStripper.getText (PDFTextStripper.java:180) Description of my PDF that is having the problem: =================================================== The PDF (which I am not allowed to share) was created by a Scanner that OCR'd data from a FAXed page. The original FAX page was a Govt Form which had poor scan quality to begin with. As a result most of the lines that make up the Form's "boxes" are faded / incomplete when FAXed. Fortunately, the original Text Content is clearly typed and is correctly represented as text within the PDF Document. (I can actually copy/paste it from Acrobat Reader, for example). However, the PDF does contain much of the Govt Form's original fuzzy outline. Most of these graphical lines are faded/incomplete. Would these lines cause a problem with Text Extraction perhaps ? The Sample Code that I based my simple Java Class on: =========================================================== http://www.java-forums.org/advanced-java/8546-reading-text-using-pdfbox. html PDDocument pddDocument=PDDocument.load(new File("a.pdf")); PDFTextStripper textStripper=new PDFTextStripper(); System.out.println(textStripper.getText(pddDocumen t)); pddDocument.close(); (I am using the latest version of PDFBox and its supporting jar file for fo ? Click here to submit conditions This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.

