[jira] Created: (PDFBOX-373) (null) printed when characters cannot be decoded during text extraction

Brian Carrier (JIRA) Wed, 17 Sep 2008 13:24:08 -0700

(null) printed when characters cannot be decoded during text extraction
-----------------------------------------------------------------------


                 Key: PDFBOX-373
                 URL: https://issues.apache.org/jira/browse/PDFBOX-373
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 0.8.0-incubator
            Reporter: Brian Carrier
             Fix For: 0.8.0-incubator


We have some PDF files where the TO_UNICODE map is corrupt and PDFBox cannot 
extract the text.  font.encode() returns null and PDFStreamEngine.showString() 
adds the null to the result, which is then printed as "(null)". 

Here is a patch (against the trunk) that replaces the null with "?".  

--- PDFStreamEngine.java        2008-09-17 16:09:13.529318500 -0400
+++ PDFStreamEngine-new.java    2008-09-17 16:12:51.617318500 -0400
@@ -422,6 +422,11 @@
                 }
             }
 
+            // Replace a null entry with "?" so it is not printed as "(null)"
+            if (c == null)
+            {
+                c = "?";
+            }
             totalStringWidth += width;
             stringResult.append( c );
         }


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PDFBOX-373) (null) printed when characters cannot be decoded during text extraction

Reply via email to