Spaces don't exist as characters in PDFs. To identify spaces, you have to compare the X coordinates of adjacent characters against their widths.
On Thu, Mar 17, 2016 at 7:12 PM, Hesham G. <[email protected]> wrote: > Hello , > > I have a PDF file created using Latex. I am trying to read and print all > letters in that file using PDFBox, but when doing this all spaces in that > file are ignored. Here is the code I am using: > PDPage page = (PDPage)allPages.get( 0 ); > PDStream contents = page.getContents(); > if ( contents != null ) { > PDFTextStripperProcessor pdfTextStripperProcessor = new > PDFTextStripperProcessor(); > pdfTextStripperProcessor.processStream( page, page.findResources(), > contents.getStream() ); > } > > public class PDFTextStripperProcessor extends PDFTextStripper { > @Override > public void processTextPosition( TextPosition text ) { > System.out.println( text.getCharacter() ); > } > } > > And you can check a one page file sample here to test it: > > https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf > > What is the cause of this issue please? > > > Best regards , > Hesham

