[ https://issues.apache.org/jira/browse/PDFBOX-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919311#action_12919311 ]
Hendrik Lescak edited comment on PDFBOX-855 at 10/8/10 11:41 AM: ----------------------------------------------------------------- The problem occurred in a Word-Document containing an embedded "VISIO-Express Drawing-Object". And there the text was not recognized correctly as it was before your patch. Maybe this issue is out of scope. Unfortunately I can not post the example file, it is from a customer project. > Extracted Text of MS Word generated PDFs corrupt > ------------------------------------------------ > > Key: PDFBOX-855 > URL: https://issues.apache.org/jira/browse/PDFBOX-855 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.3.0 > Environment: All > Reporter: Hendrik Lescak > > Since Revision 1003195 (PDFBOX-828: fixed some issues with positioning when > extracting or rendering text) the text extraction with PDFTextStripper > behaves differently for PDF documents generated with the MS Office Word 2007 > "Save as PDF" Feature. > For example: The Term "Fachbereichsleiter" changed to "F a c hb e re ic hsle > ite r" after PDFBOX-828. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.