The command-line tesseract on that image does produce two lines. Mind you, the first line consists entirely of gibberish. Here's what I get:

.>’¢:>¢:>C)_§?
522960

That's on Linux with tesseract version 2.04 with the "eng" language-files.


Jimmy O'Regan wrote, On 2010-07-02 13:30:
Honestly, I've no idea -- that should have come out as two separate lines.

Would you mind opening an issue for this? I don't really have a lot of time at the moment, and won't for the next few weeks, but if there's an open issue I'll be more likely to come back to it.

KAH wrote, On 2010-07-02 10:49:
I am trying to figure out why tesseract is not reading this image as two lines? Is there a variable I can set that will let me tell the process to see the vertical space as space and not treat it all as one word?

Here is the image I am trying to read: http://dl.dropbox.com/u/1531272/pg1-CROP.jpg

Thanks for any help you can offer as I try to tweak this awesome product.

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to