Hi, > Frank van der Hulst <[email protected]> hat am 17. März 2016 um 08:34 > geschrieben: > > > Spaces don't exist as characters in PDFs. To identify spaces, you have to > compare the X coordinates of adjacent characters against their widths. That's not correct, spaces exist but in most cases pdf engines omit them and replace spaces by a splitted text with an appropriate positioning.
BTW, latex uses the same strategy. Here is a excerpt from your pdf: [ (W) 55 (ith) -383 (due) -384 (r) 18 (egar) 18 (d) -383 (to) -383 (Article) -384 (\(219\),) -416 (the) -384 (competent) -383 (authority) -383 (has) -384 (the) -383 (right) ] TJ The text is in between the braces and the numbers are used for horizontal positioning. BR Andreas > > On Thu, Mar 17, 2016 at 7:12 PM, Hesham G. <[email protected]> wrote: > > > Hello , > > > > I have a PDF file created using Latex. I am trying to read and print all > > letters in that file using PDFBox, but when doing this all spaces in that > > file are ignored. Here is the code I am using: > > PDPage page = (PDPage)allPages.get( 0 ); > > PDStream contents = page.getContents(); > > if ( contents != null ) { > > PDFTextStripperProcessor pdfTextStripperProcessor = new > > PDFTextStripperProcessor(); > > pdfTextStripperProcessor.processStream( page, page.findResources(), > > contents.getStream() ); > > } > > > > public class PDFTextStripperProcessor extends PDFTextStripper { > > @Override > > public void processTextPosition( TextPosition text ) { > > System.out.println( text.getCharacter() ); > > } > > } > > > > And you can check a one page file sample here to test it: > > > > https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf > > > > What is the cause of this issue please? > > > > > > Best regards , > > Hesham --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

