What Tesseract API are you using? I use TesseractExtractResult() and the sequence of boxes and characters returned NEVER includes newlines. All newlines and spaces are returned in an identical manner, as spaces, and our code applies logic based on the bounding boxes to replace some of the "spaces" with newlines. Not very complicated code but something that needs to be done.
I know nothing of the other APIs - based on Jimmy's answer it seems these other APIs do return text organized in lines, I am just not familiar with them. Patrick On Jul 2, 11:49 am, KAH <[email protected]> wrote: > I am trying to figure out why tesseract is not reading this image as > two lines? > Is there a variable I can set that will let me tell the process to see > the vertical space as space and not treat it all as one word? > > Here is the image I am trying to > read:http://dl.dropbox.com/u/1531272/pg1-CROP.jpg > > Thanks for any help you can offer as I try to tweak this awesome > product. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

