Hello all, I've noticed that OCRing a treated image the script yields some errors that affect its precision.
First I treat the image with the script to enhance it and make it grayscale. textcleaner -g -e normalize -s 1 scan.tif scan2.tif Then, I OCR it with tesseract and get the following error messages: tesseract scan2.tif output2 -l eng Tesseract Open Source OCR Engine v3.02 with Leptonica Page 0 Error in boxClipToRectangle: box outside rectangle Error in pixScanForForeground: invalid box Error in boxClipToRectangle: box outside rectangle Error in pixScanForForeground: invalid box Error in boxClipToRectangle: box outside rectangle Error in pixScanForForeground: invalid box Now, I noticed that my original text on the image says on the first line: "than Phone: her line 1 | No feel good" And I get the following instead on the output2.txt file: "thaeePleoecbakeorliIZB-TlNofoclgood" It seems to me that there are adjustments to be done. I have trained the system to use this font, include the dictionary and do all the steps as per the guide. However, the output is not good at all and the errors might point to something that I am doing wrong. I wonder if anyone could share some experience with me and give me some good practices hints on how to fix this. Thanks. -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

