[ 
https://issues.apache.org/jira/browse/PDFBOX-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-292.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.0-incubator

Issue resolved. Thanks Justin for testing 

> Text Extraction strips 1 char when extracting a twin pair
> ---------------------------------------------------------
>
>                 Key: PDFBOX-292
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-292
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Priority: Minor
>             Fix For: 0.8.0-incubator
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1761581
> Originally submitted by nalundgaard on 2007-07-26 14:11.
> See attached file. We found a bug in PDFBox where it appears to randomly 
> delete 1 character of a twin pair of characters. 
> For example, we've noticed that what shows up as 1001 in a PDF file (in 
> Acrobat) may become 101 in the text output of TextStripper.exe. This appears 
> to happen to a large number of twin pairs, as evidenced by the attached file. 
> Note that the file was created using Microsoft Word 11.3.5 on Mac OS X using 
> the "print to PDF" feature of Mac OS X 10.4.10. 
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1761581&file_id=238687
> doublesTest.zip (application/zip), 40221 bytes
> This zip file contains a test PDF file and the text output from running 
> ExtractText.exe on it, in version 0.7.2 and 0.7.3
> [comment on SourceForge]
> Originally sent by ibuzz.
> Logged In: YES 
> user_id=964306
> Originator: NO
> I had the same issue with a PDF document created with Microsoft Word 2004 for 
> Mac OS X. No problem with Word X.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to