Hey everyone,

I hesitate to post this as I'm likely just making rookie mistakes, but 
perhaps this particular test image will prove to be useful for learning 
about tesseract.

My application uses domain specific constraints to pre-segment the blocks 
of interest and each image passed to tesseract will always contain a single 
line of text. The attached input image containing 'AB' is a good example of 
the type of images I expect to have after segmentation. Several images with 
phone numbers or addresses are correctly recognized by tesseract, but I was 
surprised to see that the output for the 'AB' image was completely wrong. 

Although I'm using the api in my application, I was able to reproduce the 
exact same results with the command line using the following command:

tesseract AB.png AB-output -psm 6


the resulting 'AB-output.txt' contains:

E’-3


Having read through many past messages in the group, I'm worried that the 
only way to get reliable results from tesseract is to train it with my 
input images. However, considering that many other fields from this same 
label are interpreted correctly, I feel that there must be something else 
going on. Any help understanding what is going on here would be wonderful. 

Cheers,

Michael

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

<<attachment: AB.png>>

Reply via email to