Hey everyone, I hesitate to post this as I'm likely just making rookie mistakes, but perhaps this particular test image will prove to be useful for learning about tesseract.
My application uses domain specific constraints to pre-segment the blocks of interest and each image passed to tesseract will always contain a single line of text. The attached input image containing 'AB' is a good example of the type of images I expect to have after segmentation. Several images with phone numbers or addresses are correctly recognized by tesseract, but I was surprised to see that the output for the 'AB' image was completely wrong. Although I'm using the api in my application, I was able to reproduce the exact same results with the command line using the following command: tesseract AB.png AB-output -psm 6 the resulting 'AB-output.txt' contains: E’-3 Having read through many past messages in the group, I'm worried that the only way to get reliable results from tesseract is to train it with my input images. However, considering that many other fields from this same label are interpreted correctly, I feel that there must be something else going on. Any help understanding what is going on here would be wonderful. Cheers, Michael -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
<<attachment: AB.png>>

