The command-line tesseract on that image does produce two lines. Mind
you, the first line consists entirely of gibberish. Here's what I get:
.>’¢:>¢:>C)_§?
522960
That's on Linux with tesseract version 2.04 with the "eng" language-files.
Jimmy O'Regan wrote, On 2010-07-02 13:30:
Honestly, I've no idea -- that should have come out as two separate lines.
Would you mind opening an issue for this? I don't really have a lot of
time at the moment, and won't for the next few weeks, but if there's
an open issue I'll be more likely to come back to it.
KAH wrote, On 2010-07-02 10:49:
I am trying to figure out why tesseract is not reading this image as
two lines?
Is there a variable I can set that will let me tell the process to see
the vertical space as space and not treat it all as one word?
Here is the image I am trying to read:
http://dl.dropbox.com/u/1531272/pg1-CROP.jpg
Thanks for any help you can offer as I try to tweak this awesome product.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.