What Tesseract API are you using? I use TesseractExtractResult() and
the sequence of boxes and characters returned NEVER includes newlines.
All newlines and spaces are returned in an identical manner, as
spaces, and our code applies logic based on the bounding boxes to
replace some of the "spaces" with newlines. Not very complicated code
but something that needs to be done.

I know nothing of the other APIs - based on Jimmy's answer it seems
these other APIs do return text organized in lines, I am just not
familiar with them.

Patrick

On Jul 2, 11:49 am, KAH <[email protected]> wrote:
> I am trying to figure out why tesseract is not reading this image as
> two lines?
> Is there a variable I can set that will let me tell the process to see
> the vertical space as space and not treat it all as one word?
>
> Here is the image I am trying to 
> read:http://dl.dropbox.com/u/1531272/pg1-CROP.jpg
>
> Thanks for any help you can offer as I try to tweak this awesome
> product.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to