On 08/20/2010 02:02 PM, Jimmy O'Regan wrote:
On 20 August 2010 12:53, colbec<[email protected]> wrote:
Using tesseract 3.00 on Opensuse 11.2. From CLI as in
tesseract file.tif file
In an image that contains a line of '=' signs the recognition is much
worse than if these lines are removed, eg:
line 1 and stuff
=======================
line 3 and stuff
line 1 will be recognized, but the second and third lines will be
either missing or line 2 missing and line 3 garbled.
If the file contains lines 1 and 3 only, the recognition is almost
perfect.
Since the "=" character appears to be in the trained charset, what
kind of error does this represent for tesseract?
At a guess - without providing a sample image, that's the best you can
expect - I would say that the line of equals is being treated as
noise.
I'm sorry there is no original image, but it is copyrighted and I don't
have permission to reproduce the original. The closest I can get is to
provide the example of the 3 lines as in the OP.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.