[ https://issues.apache.org/jira/browse/PDFBOX-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting updated PDFBOX-146: --------------------------------- Priority: Blocker Reporter: Jukka Zitting Fix Version/s: 0.8.0-incubator > Document does not separate words > -------------------------------- > > Key: PDFBOX-146 > URL: https://issues.apache.org/jira/browse/PDFBOX-146 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Reporter: Jukka Zitting > Priority: Blocker > Fix For: 0.8.0-incubator > > > [imported from SourceForge] > http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1460863 > Originally submitted by lubosp on 2006-03-29 10:31. > Following document taken from Guttenberg project > http://www.gutenberg.org/ does not separate words > with spaces. I even debugged PdfBox and > PDFStreamEngine.showString() never receives the space > (32) character. Is possible that space is not in the > document (it looks that way). If so is there a way to > force insert the spaces as word separators (not sure > if this should be part of PDFStreamEngine.showString > () or it is user's responsibility in this case)? > File is to big to attach, here is the link: > http://www.gutenberg.org/dirs/etext98/pandp12p.pdf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.