(null) printed when characters cannot be decoded during text extraction
-----------------------------------------------------------------------
Key: PDFBOX-373
URL: https://issues.apache.org/jira/browse/PDFBOX-373
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 0.8.0-incubator
Reporter: Brian Carrier
Fix For: 0.8.0-incubator
We have some PDF files where the TO_UNICODE map is corrupt and PDFBox cannot
extract the text. font.encode() returns null and PDFStreamEngine.showString()
adds the null to the result, which is then printed as "(null)".
Here is a patch (against the trunk) that replaces the null with "?".
--- PDFStreamEngine.java 2008-09-17 16:09:13.529318500 -0400
+++ PDFStreamEngine-new.java 2008-09-17 16:12:51.617318500 -0400
@@ -422,6 +422,11 @@
}
}
+ // Replace a null entry with "?" so it is not printed as "(null)"
+ if (c == null)
+ {
+ c = "?";
+ }
totalStringWidth += width;
stringResult.append( c );
}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.