Hi Michael, This is a known issue -- tesseract does not handle very small isolated text well by default. usually one needs 4 or more characters. Have you tried different page segmentation modes (PSM)? --Sven
On Mon, Jan 6, 2014 at 5:53 PM, Michael Beauregard < [email protected]> wrote: > Hey everyone, > > I hesitate to post this as I'm likely just making rookie mistakes, but > perhaps this particular test image will prove to be useful for learning > about tesseract. > > My application uses domain specific constraints to pre-segment the blocks > of interest and each image passed to tesseract will always contain a single > line of text. The attached input image containing 'AB' is a good example of > the type of images I expect to have after segmentation. Several images with > phone numbers or addresses are correctly recognized by tesseract, but I was > surprised to see that the output for the 'AB' image was completely wrong. > > Although I'm using the api in my application, I was able to reproduce the > exact same results with the command line using the following command: > > tesseract AB.png AB-output -psm 6 > > > the resulting 'AB-output.txt' contains: > > Eā-3 > > > Having read through many past messages in the group, I'm worried that the > only way to get reliable results from tesseract is to train it with my > input images. However, considering that many other fields from this same > label are interpreted correctly, I feel that there must be something else > going on. Any help understanding what is going on here would be wonderful. > > Cheers, > > Michael > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.ā -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

