Hi Michael,
This is a known issue -- tesseract does not handle very small isolated text
well by default. usually one needs 4 or more characters. Have you tried
different page segmentation modes (PSM)?
--Sven


On Mon, Jan 6, 2014 at 5:53 PM, Michael Beauregard <
[email protected]> wrote:

> Hey everyone,
>
> I hesitate to post this as I'm likely just making rookie mistakes, but
> perhaps this particular test image will prove to be useful for learning
> about tesseract.
>
> My application uses domain specific constraints to pre-segment the blocks
> of interest and each image passed to tesseract will always contain a single
> line of text. The attached input image containing 'AB' is a good example of
> the type of images I expect to have after segmentation. Several images with
> phone numbers or addresses are correctly recognized by tesseract, but I was
> surprised to see that the output for the 'AB' image was completely wrong.
>
> Although I'm using the api in my application, I was able to reproduce the
> exact same results with the command line using the following command:
>
> tesseract AB.png AB-output -psm 6
>
>
> the resulting 'AB-output.txt' contains:
>
> E’-3
>
>
> Having read through many past messages in the group, I'm worried that the
> only way to get reliable results from tesseract is to train it with my
> input images. However, considering that many other fields from this same
> label are interpreted correctly, I feel that there must be something else
> going on. Any help understanding what is going on here would be wonderful.
>
> Cheers,
>
> Michael
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.ā€

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to