"RE: Why would PDFTextStripper.getText() generate a NullPointerException
?"
Follow Up on my previous post:

The Stack Trace that I am getting:
===============================================================
Exception in thread "main" java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.graphics.PDGraphicsState.<init>
(PDGraphicsState.java:83)
at org.apache.pdfbox.util.PDFStreamEngine.processStream
(PDFStreamEngine.java:201)
at org.apache.pdfbox.util.PDFTextStripper.processPage
(PDFTextStripper.java:367)
at org.apache.pdfbox.util.PDFTextStripper.processPages
(PDFTextStripper.java:291)
at org.apache.pdfbox.util.PDFTextStripper.writeText
(PDFTextStripper.java:247)
at org.apache.pdfbox.util.PDFTextStripper.getText
(PDFTextStripper.java:180)


Description of my PDF that is having the problem:
===================================================
The PDF (which I am not allowed to share) was created by a Scanner that
OCR'd data from a FAXed page.
The original FAX page was a Govt Form which had poor scan quality to
begin with.

As a result most of the lines that make up the Form's "boxes" are faded
/ incomplete when FAXed.
Fortunately, the original Text Content is clearly typed and is correctly
represented as text within the PDF Document.
(I can actually copy/paste it from Acrobat Reader,  for example).

However, the PDF does contain much of the Govt Form's original fuzzy
outline.
Most of these graphical lines are faded/incomplete.
Would these lines cause a problem with Text Extraction perhaps ?


The Sample Code that I based my simple Java Class on:
===========================================================
http://www.java-forums.org/advanced-java/8546-reading-text-using-pdfbox.
html

PDDocument pddDocument=PDDocument.load(new File("a.pdf"));
PDFTextStripper textStripper=new PDFTextStripper();
System.out.println(textStripper.getText(pddDocumen t));
pddDocument.close();

(I am using the latest version of PDFBox and its supporting jar file for
fo

Reply via email to