[
https://issues.apache.org/jira/browse/PDFBOX-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler resolved PDFBOX-292.
---------------------------------------
Resolution: Fixed
Fix Version/s: 0.8.0-incubator
Issue resolved. Thanks Justin for testing
> Text Extraction strips 1 char when extracting a twin pair
> ---------------------------------------------------------
>
> Key: PDFBOX-292
> URL: https://issues.apache.org/jira/browse/PDFBOX-292
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Priority: Minor
> Fix For: 0.8.0-incubator
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1761581
> Originally submitted by nalundgaard on 2007-07-26 14:11.
> See attached file. We found a bug in PDFBox where it appears to randomly
> delete 1 character of a twin pair of characters.
> For example, we've noticed that what shows up as 1001 in a PDF file (in
> Acrobat) may become 101 in the text output of TextStripper.exe. This appears
> to happen to a large number of twin pairs, as evidenced by the attached file.
> Note that the file was created using Microsoft Word 11.3.5 on Mac OS X using
> the "print to PDF" feature of Mac OS X 10.4.10.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1761581&file_id=238687
> doublesTest.zip (application/zip), 40221 bytes
> This zip file contains a test PDF file and the text output from running
> ExtractText.exe on it, in version 0.7.2 and 0.7.3
> [comment on SourceForge]
> Originally sent by ibuzz.
> Logged In: YES
> user_id=964306
> Originator: NO
> I had the same issue with a PDF document created with Microsoft Word 2004 for
> Mac OS X. No problem with Word X.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.