[ https://issues.apache.org/jira/browse/PDFBOX-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Carrier resolved PDFBOX-349. ---------------------------------- Resolution: Fixed Fix checked into trunk. Sending trunk/src/main/java/org/apache/pdfbox/util/PDFStreamEngine.java Sending trunk/src/main/java/org/apache/pdfbox/util/PDFTextStripper.java Sending trunk/test/input/10101-AR.pdf-sorted.txt Sending trunk/test/input/10101-AR.pdf.txt Sending trunk/test/input/601501018.pdf-sorted.txt Sending trunk/test/input/Exolab.pdf-sorted.txt Sending trunk/test/input/Exolab.pdf.txt Sending trunk/test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf-sorted.txt Sending trunk/test/input/Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf.txt Sending trunk/test/input/Garcia2004_thesis.pdf-sorted.txt Sending trunk/test/input/Garcia2004_thesis.pdf.txt Sending trunk/test/input/JavaMail-1.2.pdf-sorted.txt Sending trunk/test/input/JavaMail-1.2.pdf.txt Sending trunk/test/input/Michel2001__Review_p2_structured.pdf-sorted.txt Sending trunk/test/input/Michel2001__Review_p2_structured.pdf.txt Sending trunk/test/input/OSP_framework.pdf-sorted.txt Sending trunk/test/input/OSP_framework.pdf.txt Sending trunk/test/input/SphericalHomeomorphism.pdf-sorted.txt Sending trunk/test/input/SphericalHomeomorphism.pdf.txt Sending trunk/test/input/T05140.pdf-sorted.txt Sending trunk/test/input/T05140.pdf.txt Sending trunk/test/input/amyuni2_05d__pdf1_3_acro4x.pdf-sorted.txt Sending trunk/test/input/amyuni2_05d__pdf1_3_acro4x.pdf.txt Sending trunk/test/input/authentication.pdf-sorted.txt Sending trunk/test/input/authentication.pdf.txt Sending trunk/test/input/c21-5916 .pdf-sorted.txt Sending trunk/test/input/c21-5916 .pdf.txt Sending trunk/test/input/cweb.pdf-sorted.txt Sending trunk/test/input/cweb.pdf.txt Sending trunk/test/input/defensive_driving_class_schedule.pdf-sorted.txt Sending trunk/test/input/defensive_driving_class_schedule.pdf.txt Sending trunk/test/input/hexnumberproblem.pdf-sorted.txt Sending trunk/test/input/hexnumberproblem.pdf.txt Sending trunk/test/input/null_thread_bead.pdf-sorted.txt Sending trunk/test/input/null_thread_bead.pdf.txt Sending trunk/test/input/ocalc.pdf-sorted.txt Sending trunk/test/input/ocalc.pdf.txt Sending trunk/test/input/pdf_with_lots_of_fields.pdf-sorted.txt Sending trunk/test/input/pdf_with_lots_of_fields.pdf.txt Sending trunk/test/input/rc5.pdf-sorted.txt Sending trunk/test/input/rc5.pdf.txt Sending trunk/test/input/ruminations.pdf-sorted.txt Sending trunk/test/input/ruminations.pdf.txt Sending trunk/test/input/sample_fonts_solidconvertor.pdf-sorted.txt Sending trunk/test/input/sample_fonts_solidconvertor.pdf.txt Sending trunk/test/input/sha256.pdf-sorted.txt Sending trunk/test/input/sha256.pdf.txt Sending trunk/test/input/surface_interpolation.pdf-sorted.txt Sending trunk/test/input/surface_interpolation.pdf.txt Sending trunk/test/input/tech_report.pdf-sorted.txt Sending trunk/test/input/tech_report.pdf.txt Sending trunk/test/input/warp.pdf-sorted.txt Sending trunk/test/input/warp.pdf.txt Transmitting file data ..................................................... Committed revision 760902. > Spaces between words ignored in scanned pdf files > ------------------------------------------------- > > Key: PDFBOX-349 > URL: https://issues.apache.org/jira/browse/PDFBOX-349 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Reporter: Jukka Zitting > Attachments: SpacingFix.zip, UpdatedSpacingRegressionFiles.zip > > > [Issue from SourceForge] > http://sourceforge.net/tracker/index.php?func=detail&aid=1922502&group_id=78314&atid=552832 > I am using PDF-Box-0.7.3.dll with C# and have tested extraction on two > searchable pdfs that I have scanned in from paper. Spaces between words are > ignored for both files. I have also tested another pdf file (which I > downloaded from the internet) and it was parsed correctly. Unfortunately, > the file is 1.2MB and the upload was blocked. Please send me an email > (gkobz...@hotmail.com) and I will reply back with the file. > Thanks for looking into this. > Greg > [Comment on SourceForge] > Date: 2008-03-23 21:24 > Sender: gkobzeff > Logged In: YES > user_id=2042611 > Originator: YES > I have scanned the file into a smaller file size. I have attached the > file. > Thanks > File Added: Advanced Pain Mgmt BW.pdf > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&file_id=271548&aid=1922502 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.