[ 
https://issues.apache.org/jira/browse/PDFBOX-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated PDFBOX-146:
---------------------------------

         Priority: Blocker
         Reporter: Jukka Zitting
    Fix Version/s: 0.8.0-incubator

> Document does not separate words
> --------------------------------
>
>                 Key: PDFBOX-146
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-146
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Reporter: Jukka Zitting
>            Priority: Blocker
>             Fix For: 0.8.0-incubator
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1460863
> Originally submitted by lubosp on 2006-03-29 10:31.
> Following document taken from Guttenberg project 
> http://www.gutenberg.org/ does not separate words 
> with spaces. I even debugged PdfBox and 
> PDFStreamEngine.showString() never receives the space 
> (32) character. Is possible that space is not in the 
> document (it looks that way). If so is there a way to 
> force insert the spaces as word separators (not sure 
> if this should be part of PDFStreamEngine.showString
> () or it is user's responsibility in this case)?
> File is to big to attach, here is the link: 
> http://www.gutenberg.org/dirs/etext98/pandp12p.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to